Q
Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

What's the best disaster recovery strategy for AWS outages?

With recent news of AWS outages, we want to ensure our enterprise is ready for anything. How should we design a disaster recovery plan to minimize outages?

AWS outages happen. And while the vast majority of outages are relatively short -- just a matter of hours over...

the course of a year -- each occurrence corresponds to a tangible loss of productivity and revenue for organizations that depend on the public cloud.

Providers such as Amazon Web Services (AWS) are quick to downplay outages, often describing them as "incidents" instead. But no matter how you label service outages, they often are unavoidable -- even with workloads in mature cloud providers. AWS customers must understand the disaster recovery (DR) and business continuity (BC) options available for applications running in public cloud.

One practical approach to DR and BC with cloud services is to implement a multisite strategy that runs critical workloads in an active-active configuration between the enterprise and cloud provider. This means a critical workload is configured to run simultaneously in both the local data center and public cloud, which is configured to duplicate the local production environment.

For example, consider a critical enterprise workload that requires both an application server and database server. A service such as Amazon Route 53 DNS can channel traffic to both local and cloud sites, and the enterprise can determine how much of that traffic should go to a certain location, allowing AWS to handle more or less of the total load. The traffic directed to each site is processed through a load balancer and proxy server, and then passed to an application server, which also interacts with a database server. It's usually possible for one site to share the database during normal production, keeping the duplicate database synchronized -- a master-slave database relationship.

In enterprises that operate some workloads in AWS and some on-premises, data is synchronized and traffic is shared between the local data center and AWS. When a disruption occurs -- at either the local or cloud site -- all user traffic will fail over to the remaining site. When AWS outages are resolved, data is re-synchronized and traffic fails back -- allowing both sites to share the user load again.

It's important for organizations to consider the costs of such an active-active configuration. Costs are usually less during normal operations because the AWS deployment is only handling a portion of the total traffic load, but the actual traffic level and corresponding costs can be adjusted over a wide range, depending on enterprise needs and preferences. There's no rule that says you need to split the traffic 50/50. AWS can handle most of the production traffic or only a small part of the production load, which affects the number of compute instances employed and the choice of database replication methods.

Organizations can invoke AWS Auto Scaling to ramp up compute resources when AWS needs to meet the full traffic load, and then scale back when the companion site is restored. However, the local data center must also have the scalability to handle the full traffic load in response to AWS outages -- or any other public cloud provider outage.

Next Steps

AWS approach to DR helps spur cloud acceptance

Tricks for using Elastic Load Balancing in AWS

Disruption to AWS products worries customers

This was last published in October 2015

Dig Deeper on AWS disaster recovery

PRO+

Content

Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Join the conversation

5 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

How have AWS outages affected your company?
Cancel
We have yet to be affected by AWS outages, but that could easily change as we migrate more of our applications and services to AWS. We will certainly be keeping an eye out, and incorporating disaster preparedness and recovery into our cloud strategy.
Cancel
This really needs to be considered. Anyone who thinks that just because it's "in the cloud" that they don't have to worry about it is fooling themselves. Good suggestions.
Cancel
Organizations get a little complacent about planning for outages in AWS partly because of the hype surrounding it, and partly because of the hype around cloud in general. It’s in the cloud, it’s redundant and OK! S3 promotes it’s 99.99 reliability and 11 9’s of durability, not to mention all the talk about regions, availability zones, and endpoints that it can make your head swim. And, somewhere in that swirling cloud of assurances organizations can develop a false sense of security.
Cancel
A lot of AWS users know better; they've had outages even due to a simple (well, okay, a *big* one) thunderstorm.
Cancel

-ADS BY GOOGLE

SearchCloudApplications

TheServerSide

SearchSoftwareQuality

SearchCloudComputing

Close