Warakorn - Fotolia
Massive computing and analytical tools do much of the heavy lifting for big data projects, but big data also requires big storage. And engineers have multiple big data storage tools available to them.
Basic object store is akin to a Swiss army knife; it's suitable for nearly all types of data -- log files, test and research data, images and streaming media, to name a few. When selecting a storage option for a big data project, developers should consider big data storage tools that are inexpensive, highly available, offer high performance and integrate well with other cloud services. Amazon Simple Storage Service (S3) is often considered the de facto storage service for big data environments hosted in AWS.
But raw object storage is just one option. Big data can also use varied database services that organize data, speed analytics and support queries against large, often unrelated, data sets. If going this route, a developer will want big data storage tools that provide speed, easy management and administration, as well as scalability.
Amazon Relational Database Service (RDS) works with AWS query engines like Apache Presto to support several database engines, including SQL Server, PostgreSQL, MySQL, MariaDB, Oracle and Amazon Aurora. Alternately, enterprises that rely on NoSQL databases can use Amazon DynamoDB for extremely low latency and flexible data storage.
Big data projects are diverse, so no individual data store will fit all needs. For example, a developer dealing with large amounts of unstructured data -- such as log file analysis or massive machine learning text searches -- may require a distributed, nonrelational database service. That developer can use a tool such as the open source Apache HBase, which runs in concert with Hadoop on the Hadoop Distributed File System and integrates with the Apache Hive database. This makes HBase an ideal complement to Hadoop-based fault-tolerant distributed computing clusters. HBase and Hadoop are readily supported within Amazon Elastic MapReduce (EMR).
Using a data warehouse service, such as Amazon Redshift, provides another option among big data storage tools. These services act as central repositories for data collected and integrated from several sources. Data warehouse capacities reach into the petabytes; enterprises using business intelligence (BI) tools can frequently search and analyze data in these warehouses.
Data warehouses can retain data long term -- providing historical context for data analytics. Additionally, data warehouse options typically compress data to reduce storage costs and volumes.
Take the quiz and test your knowledge about Amazon Simple Storage Service
Think you know everything about Amazon Simple Storage Service? Test your storage knowledge with this 10 question quiz about Amazon S3.
Turning big data into intelligence and learning
Big data projects become even more valuable when enterprises turn that data into intelligence. Specialized BI management tools can help IT teams analyze, visualize and collaborate using big data.
When choosing a BI tool, look for those with fast responses, simple-to-use interfaces and interoperability with a variety of data sources, including internal data files and external third-party data sources like those from Salesforce. Amazon QuickSight is a BI management tool that can perform rapid calculations and visualizations for a wide range -- it's also intuitive enough for nontechnical employees to use. QuickSight integrates with other AWS utilities, such as EMR, S3 and RDS.
Amazon Machine Learning is an analytics tool that uses data to make predictions based on mathematical models. The idea is that the model can change dynamically -- learning from the behaviors and previous results obtained -- to constantly improve the value and accuracy of results. For example, machine learning can help identify attacks in network traffic patterns or personalize website content based on a user's activity.
Many machine learning tools rely on complex algorithms and manual tweaking, but technologies are evolving to help automate model creation and optimization. Amazon Machine Learning offers wizards and visualization capabilities that enable IT teams to construct and refine models to find patterns in data.
Use cases pile up for Amazon Machine Learning
Turn to AWS tools for big data processing
Words to go: storage options in AWS