WavebreakmediaMicro - Fotolia


AWS availability benefits from traffic detours

Disruptions occur in the cloud, and developers need to be ready for them. Using AWS tools and geographic routing can keep services available.

Availability in the public cloud means companies must ensure their services see minimal downtime. Lost connections mean lost customers for many Web-scale businesses. To ensure availability, it's important to see -- and mitigate -- issues before they become outages. With AWS, developers can take a three-pronged approach to increase durability using tools such as Amazon Route 53, Elastic Load Balancer and Auto Scaling groups.

Direct request processes, such as video playback requests, are not driven by events. For this reason, parallelization is not as applicable as it is in other back-end processes. For direct requests that need to return immediate responses, developers must support high availability. If users get a "500 Error, Server Unavailable" response when trying to play a video, that business is going to lose customers.

The days have passed where a business could easily declare a maintenance period overnight and expect that customers will accept not being able to access the service for a period of time. If a business wants to offer 99.999% uptime, it can only be down for about 40 minutes a month.

To achieve that type of AWS availability, developers must anticipate errors -- not just avoid them. Developers must have a process in place to recover from any sort of disruptive situation and they must be able to deal with all types of network and regional issues. They also need to have servers located regionally close to international customers and route them to the correct geographic locations.

Developers should focus on three levels to achieve global AWS availability. The top-most level is Amazon Route 53. At the regional level, developers want to use Elastic Load Balancer (ELB), and, at the zone level, they need to add Auto Scaling groups.

Typical AWS high-availability architecture
Figure 1: Typical AWS high-availability architecture

This architecture protects resources from several potential problems, including geographic issues, by directing users to their closest network location, from individual zone issues with ELB and from individual server issues with Auto Scaling groups. Developers would configure an Auto Scaling group to automatically kill any instances that aren't responding to the ELB health checks.

Surviving regional Amazon Web Services issues

All of this assumes, however, that AWS will not have a regional outage. This is not always the case, and in fact, there are many documented cases when Amazon has had an entire region go down for specific services, including DynamoDB and Elastic Compute Cloud. If Amazon has an issue in a region, a business could lose that entire region for its customers and would need to manually redirect traffic to another region, unless you add Route 53 health checks.

Supporting geographic routing and health checks is as simple as setting up regional endpoints that have fallbacks to other endpoints. For example, if a website is at example.com, it could set up us-east.example.com, us-west.example.com and eu-west.example.com. Example.com would then be configured to use one of those endpoints based on the closest geolocation. But each of those endpoints would be configured to use one of the three ELBs, prioritizing the closest ones but using health checks to fall back on the others.

Route 53 configurations
Figure 2: In this chart of Route 53 configurations, black represents the ideal choice, blue represents the secondary choice and red is the third choice.

Figure 2 shows that you could configure a Route 53 zone to have three separate endpoints that are determined by geographic location. If we're directed into the us-east endpoint, the first choice would be for the us-east load balancer. In the event that load balancer is not available, it would try to use the us-west load balancer. If that ELB is also down, it would fall back to the eu-west region. If all three regions are down, then you're in serious trouble. Configuring the health checks appropriately at the Route 53 level will help to minimize downtime if an entire zone fails.

This is known as increasing durability -- expecting outages in individual regions and having a plan for restoring service. But it is important to verify and support the individual viability of each region. For example, if an entire region goes down, the other regions should still be able to fully function on their own. This can be done by using database replication. Fortunately, Amazon already supports multiregion replication in DynamoDB. Many other databases also support master-master replication schemes, which can fall back to another domain to support regional isolation and durability.

When developers need to support high availability access to applications, multilevel durability is a must. Fortunately, AWS offers three very good services that can be used jointly to address regional, zone and instance-level issues. Adding application monitoring through third-party services like NewRelic can give businesses a great blend of alerting and automatic service healing to minimize downtime.

Next Steps

Big O notation boosts AWS high-performance computing

AWS testing prevents cloud failure

AWS performance monitoring maximizes efficiency

Dig Deeper on AWS disaster recovery