What exactly is auto-scaling on Amazon Web Services (AWS)? In what kinds of circumstances would it be useful?
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
AWS auto-scaling refers to the ability to provide a "group" of servers that are in charge of a specific task and that should be able to scale automatically by provisioning new servers to the group, or by removing servers from the group, based on a set of defined parameters. Essentially, AWS auto-scaling lets you define when servers should be automatically started or terminated.
AWS auto-scaling, which is one of AWS's most underused functions, is an incredibly powerful feature. For example, if you have a group of Web servers, you could have those in an auto-scaling group that automatically handles adding new servers when you get hit with a lot of traffic. You could also have it auto-scale back down, terminating instances, after your traffic drops below a certain point.
In addition, if you know you always want to have X number of servers running a certain task, you could have an auto-scaling group that simply states that X servers should be running. Then, if one is terminated for some reason -- for instance, an AWS failure, or you terminate it because it's no longer responding -- the AWS auto-scaling group would automatically launch a replacement without you having to request it.
Auto-scaling groups are most commonly used behind elastic load balancers (ELBs). These ELBs can detect how much throughput they are sending, or even how much latency they are encountering when passing through the requests. If your Web servers are behind an ELB, consider putting those servers in an auto-scaling group so that you don't have to manage provisioning new servers whenever your site-traffic patterns change.
Some major AWS cloud computing clients, such as Netflix, have policies that any production server launched in AWS must be launched using an AWS auto-scaling group. That practice ensures servers are always automatically managed and that the instances are always up and running. Any server that has a problem that can't be easily recovered can then just be terminated, and a replacement will automatically appear for it. That capability also ensures that you're only using the servers you need to at the time you need them, and it helps reduce wasteful running of underutilized servers.
Related Q&A from Chris Moyer
Event-driven computing means no IaaS provisioning and no data center to run. Can I migrate all enterprise apps to be event-driven?continue reading
What is runtime as a service and how does it differ from platform as a service and infrastructure as a service?continue reading
The DevOps model is taking off as cloud adoption grows. But what exactly are the key responsibilities of a DevOps team in the enterprise?continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.