Manage Learn to apply best practices and optimize your operations.

Balance cloud costs, performance with AWS Auto Scaling

Overprovisioning cloud resources can mean overpaying, but underprovisioning can create performance dives. Balance extremes with AWS Auto Scaling.

The pay-as-you-go model of cloud computing favors developers and architects who can design applications that run with the optimal amount of resources at all times. Overprovisioning a server means paying more than necessary to meet your needs. Erring in the other direction can be even worse: Underprovisioning resources causes an application to suffer in performance. Variations in application demand make the challenge of server sizing even more difficult, as the optimal set of resources does not stay constant.

One way to address these variations when using the Amazon Web Services (AWS) cloud is to create a load-balanced cluster of servers, then add or remove servers according to demand. You can manage the load-balanced cluster yourself, but another option is the AWS Auto Scaling service. The service maintains adequate performance levels without overprovisioning, while also alleviating some administrative overhead. Good candidates for the service are use cases that allow for readily distributed workloads across multiple servers, with significant variation in workloads.

AWS Auto Scaling uses CloudWatch, Amazon's monitoring utility, to provide the performance data needed to make scaling recommendations. CloudWatch collects performance statistics -- including CPU utilization, disk usage and data transfer -- from servers and other AWS resources at five-minute intervals for no charge. (For an additional fee, you can have performance metrics collected once per minute.) System administrators then can specify configuration policies to add or remove servers based on these metrics. For example, a policy could indicate that if the average CPU utilization exceeds 70%, an additional virtual instance should be brought online.

Implementing autoscaling groups to assist with configuration policies

The first step to using AWS Auto Scaling is to implement an autoscaling group, a set of Amazon Elastic Compute Cloud (EC2) instances that would logically be managed together. Each group has a specified minimum and maximum number of instances -- the actual number is determined by the configuration policies you specified based on the CloudWatch metrics. When manual intervention is required, AWS provides an ExecutePolicy command that allows system administrators to execute a policy without waiting for particular performance conditions.

The pay-as-you-go model of cloud computing favors developers and architects who can design applications that run with the optimal amount of resources at all times.

These autoscaling groups assist enterprises in many tasks related to maintaining the performance and costs of their applications.

Autoscaling groups can span Availability Zones, a feature that supports applications with high availability requirements. These Availability Zones are locations within an AWS region -- for example, U.S. East (Northern Virginia) or E.U. Region (Ireland) -- with the distinct infrastructure to isolate one zone from failures in another. In case of a failure in one Availability Zone, AWS Auto Scaling will start new instances in a functioning zone in the same geographic region.

Autoscaling groups can be configured with load balancers to distribute workloads across servers in the group. Amazon's Elastic Load Balancing service provides a single point of access for all traffic to your application. When a load balancer is used, autoscaling policies can reference load-balancing metrics (such as request latency), as well as EC2 instance metrics.

In addition to responding to varying workloads, Auto Scaling also supports other scaling options, including maintaining current instance levels at all times, manual scaling and scaling based on schedule.

Addressing potential issues with AWS Auto Scaling

It does take some time to complete an action specified in an autoscaling policy, but AWS implements a cool-down period after a trigger, which helps prevent a series of events from executing in response to that trigger while the initial response is taking effect. The cool-down period starts at the execution of a policy action.

Autoscaling is useful when application loads can be distributed across multiple servers, which is the case with Web servers and some application servers. Some systems, such as relational databases, can be configured to run in clusters, but there can be disadvantages. Commercial RDBMSes may charge additional licensing fees for clustering options, and those fees may exceed the savings from running the database on varying numbers of smaller servers rather than on a single large server. Also consider the administrative overhead of managing a cluster for database servers versus a single server.

About the author:
Dan Sullivan holds a Master of Science degree and is an author, systems architect and consultant with more than 20 years of IT experience. He has had engagements in advanced analytics, systems architecture, database design, enterprise security and business intelligence, and has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail and education. Dan has written extensively about topics that range from data warehousing, cloud computing and advanced analytics to security management, collaboration and text mining.

Dig Deeper on Amazon EC2 (Elastic Compute Cloud) management

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.