Amazon Web Services notified its customers that a massive AWS reboot of a substantial number of hosts in its EC2...
fleet is coming.
The maintenance will start on September 26, 2014, at 2:00 UTC/GMT (September 25, 2014, at 7:00 PM PDT) and end on September 30, 2014, at 23:59 UTC/GMT (September 30, 2014, at 4:59 PM PDT), according Thorsten von Eicken, CTO of Amazon partner RightScale Inc., in a blog post.
Experienced AWS admins took the news in stride.
"They are starting the process at a time of day when our traffic will be low and they have given us ample advanced notice," said Dave Tucker, senior director of platform development for Workiva, a financial reporting software developer based in Ames, Iowa.
Greg Arnetteco-founder and CTO of Sonian Inc.
Because Workiva does not have much persistent data on AWS and runs in multiple regions, Tucker's team can switch services to US-East, upgrade the servers in US-West, switch back to US-West and then upgrade US-East.
"It sounds like we will actually be able to do this just by doing this across AZs and would not necessarily need to switch regions -- fairly easy for us to do either way," Tucker added. "We treat our server images as disposable and have them configured via Chef or RightScale scripts at the time they start up so this should be a matter of simply restarting servers and altering our configuration to point at the new servers once they are upgraded."
Things like this are "part of the cost of doing business," Tucker said.
"The thing I appreciate here is that Amazon is being proactive and methodical in their approach and is giving us a way to keep our services running, even if it takes some extra effort," he said.
Other customers reached for comment Thursday morning said the news was unexpected, but that they have the tools to make sure the maintenance event doesn't disrupt their customers.
"It's a nuisance, but we're already prepared," said Greg Arnette, co-founder and CTO of Sonian Inc., a cloud email-archiving service provider based in Dedham, Mass.
The AWS reboot news raised concerns on an AWS support forum, where, according to one participant, AWS said customers would not be able to stop/start or r-launch instances to avoid the maintenance update.
"We need to make sure that everything is working fine on reboot, and we don't have the resources to be available for 10 different four-hour maintenance windows," said the forum participant, posting under the name ackker.
AWS could not guarantee that users' re-launched instances will land on an updated host, since so many instances are apparently affected.
"We are periodically polling for new instances and those impacted will receive new maintenance notifications accordingly," wrote one AWS official in the forum.
Another AWS customer posted that it has 100 instances scheduled for reboot, and couldn't scramble its staff on short notice to monitor impacted servers.
"… Entire service clusters of machines are scheduled to be rebooted at once, although they are in different AZs, amounting to an event equivalent to the loss of an entire Region," the poster said. "On two days' notice we can't possibly prepare for this."
AWS customers don’t have an effective way to validate that their instances are subject to the reboot, Amazon officials admitted in the forum thread.
Update: As of 1 pm ET on September 25th, AWS officials have updated the forum thread to say that there is now tooling available that "continuously rechecks all running instances for missing scheduled maintenance data and re-populates it as needed."
"We are working non-stop to attempt to deliver something useful that you can rely on as soon as possible," posted Doug@AWS, who identified himself as the director of the EC2 fleet – presumably Doug Grismore, director of EC2 fleet operations and reliability, according to his LinkedIn profile.
AWS officials have been mum so far on why many of its systems need a patch. The vendor may be hesitant to call out the vulnerability it's patching for fear of attracting attacks.
"It seems obvious that the company is patching a security vulnerability, but it will not disclose which one until October 1 — that is, after they have patched all hosts," according the RightScale blog post.
There was a massive AWS reboot of this nature in December 2011.
UPDATE as of 9/26 –In a blog post published Thursday afternoon, AWS clarified that the reboot is necessary because of an undisclosed security issue with the Xen hypervisor. The reboot affects less than 10% of AWS instances worldwide, the company said.