bluebay2014 - Fotolia

AWS US-East-1 region scrutinized following service incidents

AWS US-East-1 makes headlines when there are outages and elevated error rates. But is it really Amazon's least reliable cloud region?

AWS US-East-1 has a less-than-stellar reputation for reliability among U.S. customers, but statistical analysis...

and monitoring of AWS availability worldwide tells a different story.

Amazon Web Services (AWS) US-East-1 in Northern Virginia is Amazon's largest and oldest region, with the highest number of data centers, availability zones and customers. Thus outages there attract the most attention, and often result in widely publicized news reports and post-mortem notes on Amazon's website (see sidebar).

Over the last four years, service disruptions, increased error rates and outages in the AWS US-East-1 region have pushed some customers to the US-West-1 region in California or the US-West-2 region in Oregon.

"It has been my experience that US-East does have the majority if not all of the issues," said Christopher Riley, a founding partner at HKM Consulting, who moved his AWS operations to US-West-2 despite being based in Rochester, Mass.

Recent high-profile outages in AWS

The experience that prompted this move was a multi-day Elastic Block Store outage in AWS US-East-1 in April 2011.

The AWS region in Oregon hasn't had any issues that have affected HKM, according to Riley.

Despite the problems Riley had, the US-East-1 region had just one more outage than US-West-1 in the past year, according to CloudHarmony Inc., a company located in Laguna Beach, Calif., that conducts independent third-party monitoring of cloud performance and uptime.

Other customers believe the AWS US-East-1 region has grown too large and busy for Amazon to effectively support it.

"Northern Virginia is just a massive environment, to the point it's dangerous," said a production operations staffer at a Washington, D.C.-area startup who requested anonymity. "Both Amazon and customers really should considering creating capacity in other and new regions."

AWS US-East-1 by the numbers

While AWS outages in US-East-1 tend to get the most attention, where Elastic Compute Cloud (EC2) is concerned, it's actually not Amazon's most problematic region globally, according to monitoring performed by CloudHarmony.

CloudHarmony maintains one VM in each of the AWS EC2 regions worldwide to monitor for availability, and it turns out that on a worldwide scale, at least, AWS US-East-1 is far from the least reliable, with 3.93 minutes of downtime in the last year.

Many of the outages you hear about occur in US-East-1, but affecting only a subset of users.
Jason ReadFounder of CloudHarmony

Instead, the distinction of 'most unreliable' in CloudHarmony's analysis goes to the AWS region in Sao Paulo, Brazil (SA-East-1), with eight outages in the past year totaling 35.62 minutes of downtime.

Meanwhile, a VM running in US-West-1 (Northern California) has had 7.82 minutes of downtime due to scheduled maintenance. Scheduled maintenance is excluded from CloudHarmony's availability metric for the region, which still stands at 100%, but is included in the downtime summary. Other regions with 100% EC2 availability for the past year, according to CloudHarmony, include Asia Pacific (Singapore) and US-West-2 in Oregon.

"Many of the outages you hear about occur in US-East-1, but affecting only a subset of users," said Jason Read, founder of CloudHarmony. "The last outage [Sept. 20] didn't impact our VM, for example."

While both EC2 US-West regions boast 100% reliability compared to US-East-1's 99.9993%, availability in US-East-1 has generally been comparable to other regions, Read said -- a small difference of about four minutes of downtime over the last year.

For Simple Storage Service (S3), however, both US-East-1 and SA-East-1 have been most problematic. Amazon S3 in US-East-1 has seen 26.55 minutes of downtime in the last year, while SA-East-1 S3 has been down 23.08 minutes, according to CloudHarmony's analysis. Amazon S3 in US-East-1 has also had the most storage outages of any region with 21 incidents in the last year, while SA-East-1 has experienced nine incidents.

The US-West-1 and US-West-2 regions also show 100% availability for Amazon S3 over the past year, as do Asia-Pacific (Sydney) and EU-Central-1 (Frankfurt).

It's worth keeping these numbers in broader market perspective -- Amazon had the least downtime overall among the major cloud service providers in 2014, with a total of 2.43 hours downtime for the year across all regions. By comparison, Microsoft Azure had nearly 40 hours downtime, according to CloudHarmony.

Amazon declined to comment for this story.

Beth Pariseau is senior news writer for SearchAWS. Write to her at [email protected] or follow @PariseauTT on Twitter.  

Dig Deeper on AWS disaster recovery