The ability to scale workloads according to demand is one of the biggest benefits of public cloud. And there are three public cloud scaling methods most enterprises use: reactive scaling, predictive scaling and analytic scaling. But choosing the right one for your AWS cloud depends on how much you know about your workloads and their patterns.
AWS and reactive cloud scaling
Amazon Web Services (AWS) Auto Scaling is a reactive service that adds new servers in response to increased load on the system. A lagging indicator is a time delay between the new load being placed on the system and the system's response. Depending on how long it takes for new resources such as Elastic Compute Cloud (EC2) instances to come online, the lag can be measured in minutes or tens of minutes.
Lag time depends on a number of factors. If you create on-demand instances, they spin up within tens of seconds. If, on the other hand, you are using spot instances, it takes (on average) five minutes to start. Once your instance is running, it still may not be ready to perform a task depending on whether you are using cooked application machine instances (AMIs) or not.
A cooked instance strategy means your AMI includes all the tools you need, so as soon as the machine starts, you can start working. Non-cooked images are bare-bones and require the use of Puppet or Chef scripts to download and install packages before the system is available.
Netflix's predictive approach to cloud scaling
Some companies take a different approach. Netflix, for example, uses predictive scaling to anticipate load and preemptively launch additional instances. This approach is based on the notion that load patterns are often repeatable and therefore predictable. For example, you don't have to go too far out on a limb to predict that Friday night will see a higher load on the Netflix servers than Friday afternoon.
Using this approach, Netflix provisions extra resources ahead of the anticipated additional load. This may incur a small increase in cost, as some resources are online before they are needed. But it also may avoid a temporary degradation in end-user experience as the system avoids a slowdown.
Analytic scaling works from within
Analytic scaling is a third approach to public cloud scaling. Some systems generate their own load rather than having it created externally. A streaming video system, for example, might download new movies that need to be transcoded. A financial system might download new stock data every night. In both cases, the "load" on the system is a known value because the system itself creates the load.
In these examples, the system itself can add or remove instances based on its own understanding of the load.
Combining all of these cloud scaling methods may be the best approach. If you use AWS Auto Scaling, the component doing the scaling is the AWS system itself. You don't have to worry about monitoring the auto scaler, because it's part of the AWS platform. However, if your system does its own scaling, then the component that owns it has to be made resilient in case your auto scaler dies.
For this reason -- and because your predictive or analytic scaling algorithms might have errors -- many systems use AWS Auto Scaling to establish a minimum number of instances. Set up an Auto Scaling group with a rule such as, "Keep at least one EC2 instance running," and let your custom scaling rules handle the upper limits. The asymmetry of the AWS Auto Scaling rule mirrors the asymmetry of having too many instances (extra cost, but good response time) versus having too few instances (less cost, but potentially poor response time).
About the author:
Brian Tarbox has been doing mission-critical programming since he created a timing-and-scoring program for the Head of the Connecticut Regatta back in 1981. Though primarily an Amazon Java programmer, Brian is a firm believer that engineers should be polylingual and use the best language for the problem. Brian holds patents in the fields of UX and VideoOnDemand with several more in process. His Log4JFugue open source project won the 2010 Duke's Choice award for the most innovative use of Java; he also won a JavaOne best speaker Rock Star award as well as a Most Innovative Use of Jira award from Atlassian in 2010. Brian has published several dozen technical papers and is a regular speaker at local Meetups.