The inner workings of AWS Auto Scaling

Reactive programming may sound like a buzzword, but the concept is not new. Auto Scaling can add EC2 instances automatically as workloads increase.

Reactive programming, or the ability for a program to automatically react to load changes in its environment and...

continue running, is one of many IT buzzwords thrown around. Many programs that were always supposed to be resilient in the face of change are labeled "reactive." Amazon Web Services includes an Auto Scaling feature that allows the system to react to such changes.

Reactivity is really a systems concept. Each program should be resilient to change, and the overall system should add/remove/modify programs and resources as needed as well.

Auto Scaling allows a system to add resources, typically AWS EC2 instances, when the load increases and removes instances when the load diminishes. There are actually four types of Auto Scaling: fixed, manual, scheduled and dynamic.

Fixed Auto Scaling ensures that AWS replaces instances that fail, while manual Auto Scaling adds or removes instances by hand. Scheduled Auto Scaling is used when your load varies by date/time only and dynamic Auto Scaling is for when your load changes unpredictably. This tip will only cover Dynamic Auto Scaling.

You may think the term “load” is predefined to mean something like CPU% or Ubuntu LoadAverage or Web response time, but cloud admins actually have full control over what constitutes load. You can use CloudWatch metrics to define these conditions and then group them into one or more policies, which can be predefined metrics or ones you create for specific system needs.

To understand AWS Auto Scaling, you need to understand two important concepts: hysteresis and EC2 charge-by-hour.

Hysteresis means adding a lag time to the response a system has to an event. Take your home thermostat, for example. If you set it to 68 degrees in the winter, the furnace doesn’t turn on until the temperature is slightly below 68, and then it runs until it reaches slightly above 68. This occurs to avoid a situation in which the furnace runs for a minute, shuts off and then turns back on again.

Charge-by-hour refers to the practice in which AWS charges for instances by the hour -- if you start an instance and then kill it after 10 minutes, you’re charged for a full hour. If, within a single hour, you terminate and start a new instance five times, then you are charged for five hours.

Combine these concepts with Auto Scaling and it makes sense: You want the system to react to changes in load. If that load is highly dynamic, however, you should moderate your response to it. This refers more often to the act of removing instances than to adding them. When your load increases, you often have no option but to add capacity by adding instances. When the load decreases, you have some flexibility in deciding when to remove instances. If an instance has been running for only a fraction of the current hour, it might make sense to leave it running for a while in case the load picks up again.

Table 1 shows the primary AWS entities involved in Auto Scaling. You can configure AWS CloudWatch alarms to fire only after a condition has existed for a specified amount of time. For example, you could create an alarm that triggers the removal of instances after some load metric dips below a threshold for more than 30 minutes. This would protect you from removing instances during brief dips in load only to have to recreate them within the same charging hour.

An Auto Scaling Policy determines the speed at which instances ramp up and down. You can have policies that add/remove one instance at a time and others that add/remove a set percentage of instances all at once.



CloudWatch metric

A thing to measure

CloudWatch alarm

A condition to trigger

Auto Scaling policy

How to add/remove instances (by absolute %)

Auto Scaling group

Min/Max/Desired instances, availability zone

Auto Scaling launch configuration

How to create instances: AMI, InstanceType

Table 1. Auto Scaling entities

The Auto Scaling group specifies the availability zone of the instances as well as the minimum, maximum and desired number of instances. You can also think of this as your safety net; no matter how incorrectly you configure things, you cannot accidentally create more instances than the group maximum.

In the Auto Scaling launch configuration, you describe what type of instances you want, specifying among other things the AMI and InstanceType. Instance creation takes several options and you can specify any or all of these in the launch configuration.

Overall, the process would require you to put data into a metric (if using a custom metric) and then trigger an alarm. The alarm will trigger a policy that creates/removes instances according to the configuration within the group.

About the author:
Brian Tarbox has been doing mission-critical programming since he created a timing-and-scoring program for the Head of the Connecticut Regatta back in 1981. Though primarily an Amazon Java programmer, Brian is a firm believer that engineers should be polylingual and use the best language for the problem. Brian holds patents in the fields of UX and VideoOnDemand with several more in process. His Log4JFugue open source project won the 2010 Duke's Choice award for the most innovative use of Java; he also won a JavaOne best speaker Rock Star award as well as a Most Innovative Use of Jira award from Atlassian in 2010. Brian has published several dozen technical papers and is a regular speaker at local Meetups.

Dig Deeper on Amazon EC2 (Elastic Compute Cloud) management