AWS analytics tools help make sense of big data
A comprehensive collection of articles, videos and more, hand-picked by our editors
Storage requirements can vary widely, even within a single organization. And, if you are not using the right service...
for your specific needs, you might overpay for cloud storage. Amazon Web Services (AWS) offers a few options, each with varying performance levels and prices. The first step to optimize your storage services is to assess the benefits and drawbacks of the various AWS cloud storage options.
AWS has three storage services from which to choose: Amazon Simple Storage Service (S3), the Elastic Block Store (EBS) and Amazon Glacier. Depending on your particular requirements, you may find that the optimal storage solution is one of these services or some combination of all three.
Amazon Simple Storage Service is an object store designed to hold large volumes of data organized into "buckets." Buckets are somewhat analogous to directories and store files of up to 5 terabytes (TB).
Amazon S3 works well when you need to store content or data, access the data frequently and can tolerate some variation in performance, such as with document management and big data analysis. S3 objects are replicated across multiple storage devices to improve durability, but AWS offers a reduced redundancy storage option with a 99.99% durability guarantee, instead of the 99.999999999% rate offered with standard S3. S3 standard storage starts at 9.5 cents per GB per month; reduced redundancy starts at 7.6 cents per GB per month.
AWS Elastic Block Store
S3 works well with programmatically manipulated objects, but it is not suitable for applications that require guaranteed levels of performance and access to a file system, such as relational databases. For those use cases, the EBS is a better option. EBS allows for attached storage volumes of up to 1 TB. Unlike S3 objects, which are accessible from virtually any device, EBS volumes are attached to a single Elastic Compute Cloud (EC2) instance. A key advantage of EBS volumes is the ability to provision a guaranteed level of input/output operations (IOPS). For example, an application may require database queries to return results within two seconds and, to achieve this, the storage system must perform 1,000 IOPS under expected loads. For cases in which only access to a file system is needed, EBS volumes can be provisioned for 10 cents per GB per month, plus 10 cents per 1 million IO operations. If you provision an IOPS level, EBS volumes will cost 12.5 cents per GB per month, plus 10 cents per provisioned IOPS per month. Since EBS devices are limited to 1 TB, you may need multiple EBS devices attached to your EC2 instances. Note that if you have multiple IOPS provisioned devices you will pay the IOPS provisioning fee for each ESB device.
The third option is Amazon Glacier, a low-cost archival storage service. Amazon Glacier is by far the least expensive storage option 1 cent per GB per month), but there are significant constraints that come with that price.
The most important restriction is that the time of data access operations is measured in hours, not subseconds. Because a typical retrieval operation may take anywhere from three to five hours to complete, this service is only appropriate for long-term storage of frequently accessed content. You might, for example, use Amazon Glacier to store emails and documents for compliance or e-discovery purposes. AWS also imposes additional charges if data is retrieved from Glacier storage within three months of writing the data.
Optimizing your AWS cloud storage is a matter of balancing costs and features, with particular emphasis on performance, durability and access time. EBS storage provides for guaranteed performance and file system storage, but at a higher cost than other AWS services. EBS is best used with I/O intensive applications that can't tolerate variation in response times. Provisioning IOPS requires that you use EBS-optimized EC2 instances. Instances are rated for particular levels of IOPS performance; for example, an M1 Large Instance is rated for up to 500 IOPS, but a M1 Extra Large Instance is rated for up to 1,000 IOPS. You may find that provisioning IOPS will require larger machine instances and therefore additional costs.
Evaluate how frequently you access objects in S3 storage. If you are storing backups for extended periods but you rarely access older backups, then they are good candidates for migrating to Glacier. It is reasonable to keep backups in S3 for some period of time, because you may need to use them to restore data or applications. Amazon S3 supports object lifecycle management policies, including automatically migrating data from S3 to Glacier. Administrators can define migration policies in the AWS management console or programmatically. See Amazon documentation for details.
About the author:
Dan Sullivan, M.Sc., is an author, systems architect and consultant with more than 20 years of IT experience. He has had engagements in advanced analytics, systems architecture, database design, enterprise security and business intelligence. He has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail and education. Dan has written extensively about topics that range from data warehousing, cloud computing and advanced analytics to security management, collaboration and text mining.