kantver - Fotolia


Avoid AWS data lock-in with these key methods

It's easy to migrate data to the cloud, but not as easy to get data out. AWS has eased some lock-in concerns with data export and transfer techniques.

Cloud data lock-in is a perennial concern of IT execs who fear that once they move applications and data to an...

infrastructure as a service provider, technical constraints will make it hard to switch vendors in the future. In several surveys, IT pros voted lock-in to be one of the top inhibitors to cloud adoption. Fears of a vendor such as AWS obstructing data and resource migration have inhibited the majority of enterprises from taking full advantage of cloud services.

And these concerns aren't unjustified, as cloud providers make it easy to deploy cloud services, but migrating elsewhere is invariably an afterthought. The worry is that migrating data will become a one-way street -- easy to get data in, but getting data out requires more effort.

As the leading cloud provider, AWS has addressed some of the objections to data lock-in. Although its documentation emphasizes inbound workload migration, the company hasn't ignored the need for bidirectional data movement and export. There are several ways to efficiently move large amounts of data to and from AWS, including physically shipping disks -- the cloud version of sneakernet -- on-premises storage gateway appliances and private network connections. These techniques help ensure that AWS doesn't turn into a roach motel for your data.

AWS Storage Gateway

A storage gateway provides perhaps the simplest way to locally access AWS-stored data, bridging AWS object and block file services with a local storage area network. Developers implement a virtual appliance to establish a gateway -- and AWS provides one -- or a gateway can be part of a physical storage system, like those from Avere, CTERA, Nasuni, Panzura and others, which expose AWS storage as iSCSI volumes on a local network.

Using the AWS Storage Gateway, primary storage can either be on AWS in Simple Storage Service (S3) buckets or on local storage volumes. AWS primary storage automatically keeps local cached copies of hot data on whatever disk volumes -- direct-attached storage, network-attached storage (NAS) or a storage area network -- are available to the gateway server. When using AWS for secondary storage, the gateway automatically replicates local volumes as snapshots stored in S3. Snapshots then seed Elastic Block Store (EBS) volumes that are accessed by Elastic Compute Cloud (EC2) instances.

Working with hybrid storage architectures

One alternative to exporting data from AWS involves keeping primary data on premises and using storage virtualization software to expose repositories as file, object or block stores. These stores can replicate to AWS as needed.

AWS has several hybrid cloud partners that specialize in storage services that use AWS for backup, long-term archive, disaster recovery data mirrors or data integration and replication. Organizations with vast data repositories understand that data gravity makes it difficult to migrate from one infrastructure platform to another, which makes software-defined storage an intriguing option to keep primary data local, while also enabling it for use on AWS.

Users can schedule snapshot updates to mirror data between their data center and AWS. Shared volumes can copy incremental changes to the AWS snapshot, and when an EC2-based application changes an EBS volume, developers can restore the snapshots to the on-premises gateway. Even though the gateway uses a particular S3 volume as the transfer point, users can migrate data out by copying other S3 buckets or EBS snapshots to the gateway volume and then downloading it using the on-premises appliance.

Each gateway appliance costs $125 per month, while cached and snapshot volumes are charged based on the amount of S3 or EBS snapshot storage used per month, at three and five cents per gigabyte, respectively. The higher cost is actually associated with getting data back out of AWS, particularly if a business has a lot of bidirectional traffic. Getting data into AWS is free, but copying it back costs $900 for the first 10 TB per month.

AWS Direct Connect

Organizations building hybrid cloud architectures with a lot of bidirectional data traffic flowing between AWS and private data centers should use AWS Direct Connect to create a fast, low-latency, private network between clouds. Users will still incur the same egress charges of three cents per GB, but Direct Connect substantially improves network performance, reliability and security when using S3 gateways for bulk data transfer.

Snowball and AWS Import/Export

Although AWS doesn't charge to move data into the cloud, it still is too costly and time consuming to move large repositories off of S3 -- as much as nine cents per GB. In response, AWS developed a cloud version of sneakernet. AWS Import/Export Snowball, an 80 GB ruggedized drive, is suitable for one-time data migration -- mailing a large storage device back and forth beats the bandwidth of any network connection and eases data lock-in concerns.

Snowball is a combination of hardware and software designed to simplify the migration of tens to hundreds of terabytes. The hardware piece features a 50 or 80 TB drive in a hardened case with a 10 GbE interface and built-in cabling for copper connections. The Snowball client software is a standalone terminal application that handles data transfer from a local workstation or NAS. Developers copy a data set to Snowball and then ship the appliance back to AWS using the provided box and shipping label. AWS techs load it into one or more S3 buckets, depending on the configurations that a business specifies.

AWS' export options prevent strict data lock-in, but egress costs do make large-scale exports less financially attractive.

For organizations that want to break out of the AWS ecosystem, Snowball can also export data. IT teams set up Snowball export jobs in the AWS Management Console that specify the region and S3 buckets to copy. Developers don't have to export an entire bucket; they can specify a beginning and ending S3 key range to narrow the data set. AWS starts a Snowball export job within 24 hours and ships the device within a week. As with the importing process, Snowball encrypts all data with keys generated and managed using the AWS Key Management Service.

Once a Snowball appliance arrives, it can be connected to a workstation and can then reverse the process to retrieve the data using the Snowball client. S3 buckets map to a directory tree using the bucket name, with subdirectories named after each key, but S3 metadata does not copy over. The entire process is the cloud equivalent of copying files to a USB key and taking it to another machine.

Quiz: Test your knowledge: Amazon Simple Storage Service quiz

Think you know everything about Amazon Simple Storage Service? Test your storage knowledge with this 10 question quiz about Amazon S3.

Each Snowball import or export job costs either $200 for a 50 TB device or $250 for an 80 TB box -- both prices are good for ten days of on-site use. It costs $15 for each additional day. Jobs also incur normal S3 data transfer costs -- free for inbound transfer and three cents per GB in North America and Europe, and four cents per GB in Asia Pacific for outbound transfer. Thus, migrating 80 TB out of AWS using Snowball costs $2,650.

Transfer recommendations

AWS' export options prevent strict data lock-in, but egress costs do make large-scale exports less financially attractive. For these situations, AWS users should:

  • Consider the data migration path -- public internet, Direct Connect or Snowball -- and costs before moving large repositories to AWS.
  • Design and test a data migration path for native cloud apps built on AWS.
  • Customers needing large, 100 plus TB local data repositories on AWS should consider a hybrid storage architecture in which the primary data store remains local, with cached copies of data subsets used on AWS for specific applications.
  • Consider using a service for long-term archival storage, such as Amazon Glacier, for logs or other data with relatively short retention lifetimes. This type of data likely won't move back on site or to another provider. In this scenario, log processing and analysis with tools like Loggly, Splunk or Sumo Logic are an option.

Next Steps

Lambda alternatives enable serverless computing, help to avoid lock-in

Words to go: AWS storage services

Decide on an AWS storage option that meets your needs

Dig Deeper on Amazon S3 (Simple Storage Service) and backup