
BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
AWS analytics tools help make sense of big data
-
Article
AWS big data analytics -- What does Amazon offer?
Become familiar with Amazon's big data analytics products to help find a fit for your enterprise. Read Now
-
Article
Use Amazon tools, staff appropriately for big data analytics
Amazon's tools are helpful, but enterprises must also seek out qualified candidates despite a skills gap. Read Now
-
Article
Google's search engine database advances big data
The release of Google Cloud Bigtable, a managed and scalable NoSQL database, added competition for AWS. Read Now
-
Article
Technological look into mining data
Take a thorough look at the different technologies related to big data analytics and how they can help business operations. Read Now
Editor's note
The rise of public cloud has legitimized the data analytics market -- making big data a bigger deal than ever before. Companies that have been collecting terabytes of data for years can now use public cloud as a cost-effective approach to mine and analyze that data. And successful big data analytics strategies often mean a competitive advantage for companies.
Amazon Web Services (AWS) offers a variety of data analytics tools. AWS customers can do everything from process data in real time to implement machine learning for applications. Currently, there are five primary AWS products for cloud-based analytics: Elastic MapReduce (EMR), Kinesis, Redshift, Data Pipeline and Machine Learning.
Third-party tools also exist to diversify and expand on the AWS analytics portfolio. While each service supports big data in its own way, it's key for administrators to understand each offering to ensure proper data integration.
1Accessing, protecting and storing big data
Amazon Web Services makes managing big data easier and more cost effective than ever, with a variety of options to store the petabytes. Amazon's typical slate of products is well equipped for storing big data, including Simple Storage Service (S3) and Elastic Block Store (EBS). But speed is a consideration in data analytics; the faster an enterprise can access its data, the faster it can act on it. Enterprises can access that data more quickly by using a secure NoSQL database, which relies on solid-state hard drives. DynamoDB is a great place to start, though third-party options are available. Amazon Relational Database Service plays a complimentary role to a NoSQL database by offering quick and consistent performance, and is optimized for transactional workloads. Elastic File System can be another useful tool in big data projects, scaling up to handle large flows of data.
-
Article
The big three AWS cloud storage options
Learn the differences between S3, EBS and Glacier, and when to use each data storage option. Read Now
-
Article
Secure big data within Amazon DynamoDB
Amazon DynamoDB and other popular big data storage options still need to secure the data they store. Read Now
-
Article
NoSQL database security drives big data
NoSQL databases handle massive amounts of unstructured data and are growing in demand because of authentication and access controls. Read Now
-
Article
Comparing EFS to other AWS storage options
Like any database option, Elastic File System has its strengths and weaknesses. Learn when to use it for big data and other projects. Read Now
-
Article
Seeking a data warehouse in the cloud? Try Redshift
For years data warehousing was out of reach for businesses, with costs too exorbitant to justify. Amazon Redshift leads the next generation of data warehouse options. Read Now
-
Article
Take the Amazon Redshift quiz
Learn more about Amazon's data warehouse service by testing your knowledge with this 10-question quiz. Read Now
2Process big data, and then visualize it
Once you're ready to mine and process data from your databases, there is no shortage of tools to help with that task. In some situations, enterprises need instantaneous information -- such as monetary transactions, social media response and clickstreams. Amazon Kinesis allows users to build a dashboard or application to monitor information as soon as it comes in from the data stream. Kinesis dashboards are one method for visualizing big data, but it might not suit the needs of every business. Third-party options like Tableau offer connectivity to EMR and other AWS products. Being able to see past data and using it to generate predictive algorithms is another challenge. And creating mathematical algorithms to interpret future data can be a tough and time-consuming task. Amazon Machine Learning provides visualization tools and helps create models to react to real-time data.
-
Article
Amazon Elastic MapReduce evolves, supports Apache Spark
Hadoop frameworks are a cornerstone of Elastic MapReduce, but the program is evolving to meet the needs of more customers. Read Now
-
Article
Amazon Kinesis allows for real-time data processing
Kinesis has its strengths and weaknesses. Learn how best to apply the real-time data processing tool to your analytics operation. Read Now
-
Article
When to use Amazon Kinesis -- and when not to use it
Kinesis is a flexible data processing program that can begin in seconds, but that doesn't necessarily mean it's the ideal tool for your enterprise. Read Now
-
Article
AWS Data Pipeline helps manage cloud workflows
Data doesn't serve one solitary purpose. AWS Data Pipeline streamlines data to help identify your cloud workflow. Read Now
-
Article
The benefits and drawbacks of a machine learning service
Machine learning is all about mathematics, and AWS Machine Learning puts the numbers to work for its users. Read Now
-
Podcast
Real-world uses for machine learning
A growing data ecosystem is leading to more businesses using higher levels of compute power. Here's how some of them are using machine learning. Listen Now
3Glossary
This glossary of common terms relating to big data analytics can help you get started.
-
Definition
Big data
Big data is an evolving term that describes a large volume of structured, semi-structured and unstructured data that has the potential to be mined for information and used in machine learning projects and other advanced analytics applications. Read Now
-
Definition
Big data analytics
Big data analytics is the often complex process of examining large and varied data sets -- or big data -- to uncover information including hidden patterns, unknown correlations, market trends and customer preferences that can help organizations make informed business decisions. Read Now
-
Definition
Data mining
Data mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis. Read Now
-
Definition
Amazon Simple Storage Service (Amazon S3)
Amazon Simple Storage Service (Amazon S3) is a scalable, high-speed, web-based cloud storage service designed for online backup and archiving of data and applications on Amazon Web Services. Read Now
-
Definition
Amazon Dynamo Database (DDB)
Amazon DynamoDB is a fully managed NoSQL database service offered by AWS, designed to provide low latency and high performance for applications. Read Now
-
Definition
NoSQL (Not Only SQL database)
NoSQL is an approach to database design that can accomodate a wide variety of data models, including key-value, document, columnar and graph formats. NoSQL, which stand for "not only SQL," is an alternative to traditional relational databases in which data is placed in tables and data schema is carefully designed before the database is built. Read Now
-
Definition
Hadoop
Hadoop est un framework open source qui repose sur Java. Hadoop prend en charge le traitement des données volumineuses (Big Data) au sein d'environnements informatiques distribués. Hadoop fait partie intégrante du projet Apache parrainé par l'Apache Software Foundation. Read Now
-
Definition
Data lake
A data lake is a large object-based storage repository that holds data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Read Now
-
Definition
Data warehouse
A data warehouse is a federated repository for all the data collected by an enterprise's various operational systems, be they physical or logical. Read Now
-
Definition
Amazon Elastic MapReduce
Amazon Elastic MapReduce (EMR) is an Amazon Web Services (AWS) tool for big data processing and analysis. Read Now
-
Definition
Predictive analytics
Predictive analytics is a form of advanced analytics that uses both new and historical data to forecast activity, behavior and trends. Read Now
-
Definition
Amazon Kinesis
Amazon Kinesis is the fully managed Amazon Web Service (AWS) offering for real-time processing of big data. Read Now
-
Definition
Machine learning
Machine learning (ML) is a category of algorithm that allows software applications to become more accurate in predicting outcomes without being explicitly programmed. Read Now