AWS EC2 instances can now be recovered from certain failure conditions through CloudWatch, but it's not meant to...
be a full substitute for resilient application architectures in the cloud.
Auto Recovery allows users to set thresholds within Amazon Web Services' (AWS) CloudWatch monitoring service. If a problem occurs on Amazon's side of the operating system, such as an underlying hardware failure, a loss of system power, loss of network connectivity or software issues on the physical host, the instance will be automatically recovered with its instance ID, IP address, Elastic Block Store (EBS) attachments and other configuration details intact.
Auto Recovery is available for the C3, C4, M3, R3, and T2 lines of instances in the US-East (Virginia) region only, though a post on the AWS official blog indicated it will be offered in more regions soon. Instances must be running in a Virtual Private Cloud (VPC) and be attached to an EBS volume for Auto Recovery to work. Users of Auto Recovery are charged according to CloudWatch pricing, but there is no charge for the EC2 or EBS resources used in the recovery process.
It's a concept that runs somewhat counter to AWS's mantras about building cloud resiliency into the application, rather than at the infrastructure level, but it won't be a shortcut that takes application resiliency out of the picture, according to Glenn Grant, CEO for G2 Technology Group Inc., a cloud consultancy and managed services provider in Boston.
G2 Technology Group offers similar capabilities through its managed services for AWS, which includes monitoring software that checks the health of the AWS cloud and reboots instances if necessary.
"I could see this feature being useful to automate that process," he said. "However, we'd need to evaluate it to fully understand the checks and thresholds, so as to not reboot instances unnecessarily."
For instance, Grant wondered if high network traffic could be mistaken for total loss of connectivity, or if a DoS attack scenario could cause an instance to continually reboot. But Auto Recovery has adaptive throttles in place to ensure that the same instance is not continually recovered.
James Statenanalyst, Forrester Research
Some industry observers had just one question: why didn't Amazon do this sooner? "I've wondered why they hadn't done this years back," said Carl Brooks, analyst with the 451 Group based in Boston. "I suppose it's because they made such a point of telling users they were on their own."
There's always a risk of users defaulting to this as a "quick and dirty way" around properly resilient application architecture, but those issues and practices are very well understood, Brooks said.
Others questioned the requirement to use EBS with Auto-Recovery.
"What happens if an EBS volume dies during the recovery?" said James Staten, analyst with Forrester Research, based in Cambridge, Mass. AWS declined to comment on record in answer to this question.
Still, this kind of feature will be necessary as a new, less highly technical customer base adopts cloud computing, according to Staten. The first wave of cloud adoption was made up of DevOps types who were happy to script automated infrastructure recovery features or fold them into applications, but the newest generation may not have that kind of expertise or operational experience with cloud computing, he said.
Several of AWS's competitors already offer similar recovery features. Google manages the availability of Google Compute Engine virtual machines that underpin its Application Engine platform as a service through its Managed VMs service; Rackspace's Cloud Servers are monitored by support staff which ensures their availability. VMware's vCloud Air service also offers VMware's Live Migration in the event of host failure. Microsoft Azure has service healing, which automatically detects problematic nodes and moves virtual machines to new hosts.