freshidea - Fotolia


Mitigating the risks of cloud service outages

Cloud service outages can put a damper on business capabilities, services and wallets. Learn how to minimize the risks of outages.

Although cloud services in general and Amazon Web Services in particular are growing at unprecedented rates, periodic service failures continue to disrupt users' operations and raise questions about the best way to mitigate the risks created by these outages.

I've always been a big advocate for cloud services since before these "on-demand" alternatives became popular. Despite well-publicized outages, today's leading cloud service providers (CSPs) still boast higher availability and performance records than most best-in-class corporate data centers – and that's without taking into account the unparalleled economies and agility that cloud services provide versus capital and labor-intensive data centers managed by most major enterprises.

Yet, it seems like every summer AWS suffers a service disruption that is severe enough to attract press attention and generate a new round of debate about whether cloud services are truly ready to support mission-critical business processes.

Last August, for instance, AWS experienced an hour-long outage in its U.S.-East data center in northern Virginia. The incident created elevated API error rates in the region and a "degraded experience" as a result of a "small number of EC2 instances [becoming] unreachable due to packet loss in a single [availability zone]," according to AWS.

Although AWS considered this an isolated event, that was little comfort to the companies impacted by the service disruption, including Vine, Instagram, IFTTT, Airbnb and Flipboard.

Ironically, AWS experienced another 25-minute service disruption around the same time that didn't have a widespread impact on its customers, but did prevent customers from accessing the site in the U.S. and Canada.

In today's on-demand world, customer loyalty is a fading concept. In the same way people have little tolerance for a restaurant or store that fails to meet their expectations, customers are equally intolerant of CSPs that suffer from service availability and performance issues that adversely affect their customers' businesses. The cost of downtime can be measured in real dollars, but lost trust and a tarnished reputation can be even more costly. It can be especially damaging when it gets the attention of major publications such as BusinessWeek and The Wall Street Journal.

As the market leader in the cloud, AWS gets its share of attention and criticism when its services fail. But, Google, Microsoft and other CSPs are all susceptible to service outages.

Mitigating the risks of cloud service failures is deceptively difficult because today's cloud services are particularly appealing due to their perceived economies and ease of deployment. The ongoing price war in the infrastructure as a service (IaaS) segment of the cloud market is being driven by AWS, with Google doing all it can to keep pace. While acquiring cloud services from AWS from a cost perspective is a no-brainer, figuring out how to properly configure AWS to satisfy your business requirements and minimize potential downtime takes more expertise and experience than most businesses possess or can afford.

Unless your organization is planning to be a big AWS customer, you can expect limited customer support to help you plan, deploy and manage your AWS resources. Instead, AWS is able to promote commodity prices by keeping its support services to a minimum. Therefore, you either have to hire or train in-house staff experts, or turn to third-party consulting companies with the skills and experience to help you assemble the right combination of AWS offerings.

And assembling the right combination of offerings with a variety of third-party cloud management solutions is essential to safeguard against potential service outages. It requires the right types of cloud instances, locations, load balancing, monitoring, measurement and manipulation to continuously adjust your AWS resources to support your needs. It also requires service redundancy and back-up services to respond if service issues arise.

It also makes sense to hedge your cloud bets. First, acquire cloud services incrementally so you can test their reliability and resiliency before you make a "big bet" on them to support your ongoing business operations. Second, don't discard your internal IT operations entirely in case you need to use them as a back-up solution when a cloud service fails.

Remember, you get what you pay for. Buying cheap cloud services will inevitably cost you. So, be sure you fully understand the risks and thoroughly examine the full price of a more reliable approach to acquiring cloud services.

Dig Deeper on AWS disaster recovery