koya979 - Fotolia

Help me set up AWS DR to avoid performance loss

We run critical and non-critical workloads in AWS. How do we set up regions and availability zones to minimize latency and maintain app performance?

One approach to AWS DR and business continuity is to implement redundant application configurations across multiple availability zones, which often are located in different geographic regions.

For Amazon Web Services (AWS) disaster recovery (DR), an enterprise can create an active-active workload configuration using a service like Amazon Route 53 DNS. This service directs a portion of the total traffic across the duplicate implementations that function simultaneously to share the processing load. The principle here is similar to using a local data center and AWS.

For example, a critical enterprise workload might require an application server and database server; IT administrators can set up duplicate deployments in two different availability zones (AZs) -- usually in different regions. Amazon Route 53 DNS can channel traffic to both instances of the cloud application deployment, which likely reside in different regions or availability zones. The traffic is split evenly between the regions, assuming both deployments are identical. However, the actual split can be different, such as when the second AWS site is in another geographic region, and greater latency is a concern.

The traffic directed to each AWS instance is processed through a load balancer and proxy server, and then passed to an application server, which also interacts with a database server. The sites can share a common database during normal production, keeping the duplicate database synchronized -- a master-slave database relationship.

If an outage occurs in one region or availability zone, the duplicate workload instance takes over. AWS Auto Scaling can typically scale up the compute resources at the remaining instance to handle all of the workload's traffic. When the disruption at the first instance is resolved, any data is re-synchronized and traffic can then resume and run between instances. Auto Scaling reduces the excess compute capacity at the second instance to manage costs.

Replicating data between applications in AWS

When a DR plan involves the availability of multiple sites actively running the same application, the integrity of multiple data sources is an important consideration. Simple backup scenarios are forgiving; a data source can be replicated to another local volume or cloud site. But problems can occur when multiple sites share data processing. In AWS DR, this can include a multisite strategy that involves local and public cloud redundancy as well as duplicating a deployment across multiple availability zones and regions. Different traffic routed to different application instances will result in different data in the data store or database. When an outage occurs and all traffic is routed to the alternate site, differences in data can result in serious errors and disruptions.

To synchronize data between redundant application sites in real time, define a redundant data store or database as a "primary" store and have both sites use that primary data source. Changes to the primary data store are then mirrored to a "secondary" data source running at the redundant site. This works well in AWS where Elastic Compute Cloud instances can easily replicate data. When one site experiences an outage, the redundant site relies on a current version of the data store or database.

It's important to pick availability zones and regions with a careful consideration of latency and application performance, as well as AWS DR needs. Migrating application deployments among availability zones and regions can meet both availability and performance needs.

For example, selecting different availability zones within the same region should provide low latency, ensuring fast data synchronization. But this strategy leaves the workload vulnerable to outages that could affect applications in other regions. Selecting AZs that are in different regions can overcome most geographic vulnerabilities, but can add latency that affects application performance.

Next Steps

AWS offers products to keep businesses running

AWS disaster recovery capability attracts third-party providers

Best practices when things go awry in AWS

Dig Deeper on AWS disaster recovery