News Stay informed about the latest enterprise technology news and product updates.

Cascading AWS outage stokes cloud fears

This weekend's AWS outage started with one service and snowballed from there, disrupting a number of cloud services.

An Amazon Web Services outage Sunday morning impacted consumer services such as Netflix, but the service disruptions...

were mild at most.

The issues began with the DynamoDB NoSQL database as a service. An Amazon Web Services (AWS) Service Health Dashboard update at about 5 a.m.  Pacific Time (PDT) on Sept. 20 said the root cause began with a portion of the metadata service within DynamoDB -- an internal sub-service that manages table and partition information.

The exact problem with the metadata service was not identified, and Amazon declined to offer further details about what went wrong. AWS users including Netflix experienced intermittent errors, making the AWS outage a high-profile event.

Recovery efforts then focused on restoring metadata operations, and APIs were throttled as the recovery work took place, resulting in service disruptions and performance slowdowns in the U.S.-East-1 region of AWS. No other regions were affected.

While there were disruptions in service, most of them encompassed increases in error messages and slow performance when API calls were made between services, EC2 instances remained up and running, customers said.

"All of my EC2 instances and RDS instances stayed up, my websites, S3 buckets and CloudFront all continued to respond as well," said E.J. Brennan, a freelance developer based in Massachusetts who works with large enterprise clients.

The biggest affect Brennan saw was with increased error rates on the Simple Queuing Service (SQS).

"That prevented a handful of users from completing some non-critical tasks," Brennan said. "I will be looking into a redundancy option for that particular service down the road because that is not something I was prepared for, but no real harm was done."

Consultants working with dozens of clients also said all was relatively quiet on the AWS front Sunday morning.

"While most of our customers are in U.S.-East, not a single one was affected," said Glenn Grant, CEO of G2 Technology Group in Boston.

AWS outages used to be much more disruptive and frequent than they have been for the last year and a half; while Amazon suffered serious high-profile disruptions in its earlier years, it was tops for cloud uptime in a survey conducted by CloudHarmony last year, with 2.41 hours of downtime.

In fact, the most disruption enterprises  felt from this AWS outage came from people who saw it as evidence that cloud computing is inherently more risky than on-premises deployments.

"We're going to have a bunch of flak today about AWS going down," said Jason McMunn, chief cloud architect at Ditech Mortgage Corp., based in Fort Washington, Pa. "There are some legacy people who don't understand the cloud, they're going to pick up on the headlines, and our point is that it's like seeing the freeway closed in California and people saying, 'You should not use freeways – freeways are scary, you should just use surface roads'."

Details of the AWS disruptions

Amazon first posted updates to its Service Health Dashboard at 3 a.m. PDT, indicating increased error rates with DynamoDB API requests in the U.S.-East Northern Virginia region. The first disruptions came at 2:13 a.m. PDT., with Cognito, DynamoDB, EC2, Kinesis, CodeCommit, CodeDeploy, Directory Service, Key Management Service and Elastic Load Balancer issues surfacing at that time.

For the next 30 minutes, the issues continued to cascade, affecting Lambda, Elastic MapReduce, CloudWatch, Amazon Workspaces and many other Amazon Services. In fact, the list of services unaffected by the disruptions Sept. 20 is shorter than the list of services involved. Unaffected services included API Gateway, CloudFront, Elasticache, Route 53, SimpleDB, CloudHSM, Data Pipeline, Direct Connect, IAM, S3, and Service Catalog.

All issues were resolved by 11 a.m. PDT.

Beth Pariseau is senior news writer for SearchAWS. Write to her at bpariseau@techtarget.com or follow @PariseauTT on Twitter.

Essential Guide

AWS re:Invent 2015: A guide to Amazon's sold-out event

Join the conversation

3 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

How have you been affected by a recent AWS outage?
Cancel
The Amazon echo device also was affected.  The alarm function went offline.  It uses aws in background.  Did other voice operated Amazon devices lose any or all functionality?
Cancel
All Amazon voice controlled devices such as the Echo were also affected. This gives support to those who think that Amazon may be rolling out this interface too quickly. We saw several aspects of the Echo going effectively off-line.
Cancel

-ADS BY GOOGLE

SearchCloudApplications

TheServerSide.com

SearchSoftwareQuality

SearchCloudComputing

Close