ra2 studio - Fotolia


AWS backup and DR prevent workload disruptions

Many enterprises are turning to AWS for its suite of disaster recovery tools and capabilities. Those enterprises must understand their application needs before constructing a DR architecture.

Traditional backup methods are carried out over a local area network by agents installed on machines. If components fail, replacing them entails time-consuming tasks that consequently create complicated backup environments. These difficult-to-manage environments require plentiful resources and technologies to operate, including virtual tape libraries and methods such as data deduplication to handle the ever-growing workloads.

These complications can be untangled now that businesses function more in the cloud. Organizations can simply provision resources for AWS backup purposes at a specific time or in a specific place, using the cloud provider's flexible infrastructure and global presence. With the AWS console or command-line interface tools, IT teams can easily automate snapshots and recovery for their entire application stack.

Let's examine two different scenarios: on-premises deployments that use AWS as a secondary site for purposes of AWS backup and disaster recovery (DR), and applications that are run and backed up in AWS. We'll discuss cloud methods and AWS building blocks that can help streamline your AWS backup and DR.

Understand your DR business needs

Outages are an inevitable occurrence. Whether your environment is in the public cloud, on premises or both, it is a basic assumption that something, at some point, will go wrong. To prepare for such events, you'll need to have an effective DR strategy in place. The basic processes and needs of backup and DR remain the same, but the methods and tools involved have changed.

In both of the deployment cases discussed here, it is important to know how to synchronize AWS backup and DR with your business requirements, such as recovery time objective (RTO) and recovery point objective (RPO).

You can plan your implementation only after you understand your requirements, taking potential costs into account. For example, in a mission-critical service, deploying an active DR site is ideal. Noncritical environments, such as ones used for development and testing, may require only a lightweight AWS backup that can be achieved by spinning up machines on demand.

Two types of secondary failover

Whether you use AWS as a secondary DR site for your on-premises environment or are running all of your operations on AWS, there are two types of configurations you can implement: hot standby or pilot light.

Hot standby is an AWS secondary site that is continuously up and running. It is kept updated and ready for user requests in case the main environment fails. The two sites are automatically synchronized. With hot standby, an entire configuration is replicated "as is" in the cloud, across AWS availability zones (AZs) and regions. By enabling seamless failover, the user experience is maintained.

While hot standby is the ultimate answer, a constantly running replica means that you will nearly double the costs of running your application in the cloud.

In the pilot light scenario, a secondary site holds the minimum amount of resources in an ideal state, effectively serving as a spare tire for your service. The infrastructural elements of your application are saved as images and snapshots, and are ready to spin off another environment if disaster strikes. For example, instances can be in a stopped state or spun from ready-made Amazon Machine Images (AMIs).

Pilot light results in minimal costs. Since nothing is really running in your secondary stack, you're paying to cover only its maintenance and storage.

Typically, AWS users choose a combination of the two approaches. For example, you may want to autoscale your Web servers to keep up with demand, and keep AMI replicas in a separate AZ and active database instances in the same zone. Ultimately, your decision will be based on how much you are willing to invest to fulfill your business needs and compliance requirements.

AWS building blocks and capabilities

There are five key features to help automate AWS backup and DR processes.

EBS volume snapshots. Amazon Elastic Block Store (EBS) volumes can be stored in Amazon Simple Storage Service by taking snapshots. EBS snapshots are considered to be the base object when it comes to keeping a backup on AWS. Snapshot backups are incremental, meaning they save only blocks that have been altered since the last backup.

Back up and recover instances. AWS provides the choice to keep an AMI of your Elastic Compute Cloud (EC2) instance or to take a snapshot of its root EBS volume. These options allow you to recover an instance. Recovering from an EBS volume is better because it can ensure system consistency, though it is a bit trickier with Windows instances.

AZs and regions. Amazon EC2 allows you to place resources in multiple locations across separate AZs and global regions. That way, if there is a failure in one zone, the performance of your resource in another will not be affected.

Resource tagging. Whether they are instances, EBS volumes or snapshots, resources can be tagged to use backup processes for a higher level of automation. If you tag your application's resources and a new resource is added to the pool by the Auto Scaling mechanism, for example, the instance will still be automatically added to an existing backup policy.

CloudFormation. CloudFormation lets you treat and provision a stack of resources as one logical unit. With the CloudFormation template, you're able to restore your application's most current infrastructure stack in pilot light mode.

While these are the key features, there are others to assist with AWS backup and DR. Direct Connect, for example, allows you to securely streamline data and resources from your on-premises environment to your AWS secondary site.

Automate your backup tests

The purpose of your backup and disaster recovery site is, of course, to ensure that you can handle trouble if it arises. It is essential, therefore, to have automated routines in place to test recovery and to identify where problems might occur.

IT teams must also test consistency to ensure that policies, including RTO and RPO, are aligned. Make sure to document the guidelines for how recovery works and which actions would need to be performed manually. It's all about finding the balance between bringing as much automation as possible into the process while knowing when things need to be carried out manually.

DR and backup are prime motivations for AWS adoption. Being able to recover your service -- at any point in time in almost any place in the world -- makes AWS a fine cloud option. Still, to build and automate backup and DR, it is important to evaluate third-party tools available through the AWS Marketplace, which can assist and accelerate implementation.

Next Steps

What should your AWS DR plan contain?

Maintain app performance through DR

Study up on AWS DR and backup

Dig Deeper on AWS disaster recovery