Amazon Web Services had a power failure at its flagship data center in the wee hours of the morning today reporting connectivity issues and a power failure from about 4AM to 10AM Eastern Standard Time.
The cloud computing giant reported an "underlying power issue" that affected some instances in its US-EAST-1 availability zone. Amazon has 4 availability zones in the US, two on the east coast and two on the west.
The outage was irksome to users, but many gave Amazon points for better disclosure and fast response on the issue. Independent cloud monitoring services accurately reported the issue, a sign that public cloud services are gaining more traction.
CloudClimate.com, a site started by German software firm Paessler showed detailed timelines for the bobble, and networking monitoring firm Apparent Networks measured the outage more accurately than Amazon disclosed from multiple locations across the country at 44 minutes and 42 or 44 seconds, depending on location.
""It shows up like a big hole" said Apparent's president, Jim Melvin. Apparent runs its own real-time 'Cloud Provider Scorecard' for the public. He said that his firm was an AWS user as well as providing monitoring software to customers for just this kind of occurrence. He said that an outage like this happens once in a while, but it was going to happen to everyone.
"Overall [AWS] did great—excepting that they had an outage—but they were really responsive, really fast," said Melvin. He said it's just part of the shift into cloud. He said his customers were using Amazon and Rackspace and their own hardware so much more interchangeably that there were more possible failures. He pointed out that interrupted internet access was a far more common problem than Amazon's service going down and likened the transition to when enterprises began to move from frame relay connections to internet-based connectivity between sites.
"The enterprise is entirely dependent on the relatively unpredictable nature of the internet," he said, but they've learned to roll with the punches when there are major outages. They'll do the same with cloud computing, said Melvin, partly by using tools like his and partly by learning to allow for a little more unpredictabilityin their 'virtual server rooms' in the cloud.
"It's that very dynamic nature that makes it so resilient, but also unpredictable" he said. Amazon did not confirm that the Dec 8 storm was the cause of the outage by press time, but the company's northern Virginia data center suffered a similar outage in July when it was struck by lightning.