AWS CloudWatch is a system for monitoring your account and the various resources within it. And while most things you create within AWS come with standard CloudWatch metrics, you can -- and should -- create new ones.
Cloud application monitoring is a critical capability, but it comes with some complexity. Because of that, AWS CloudWatch has spawned an entire sub-industry of companies selling monitoring tools built on top of it. At AWS re:Invent 2013, more than half of the vendors were selling CloudWatch extensions or replacements.
Depending on your needs, there are several ways to interact with CloudWatch metrics. One option -- a graphical chart in the AWS console -- lets you pick one or more metrics and various time scales. Keep in mind that, by default, all values are in UTC, not local time.
While this display is great for spotting trends, you also might want to receive preemptive notifications when metrics change. Many admins would find it useful to receive a text message when a critical server goes down or when system response time goes up, instead of not knowing something is wrong until a customer complains. AWS CloudWatch can do this, but it takes some effort to set it up.
One of the complexities with CloudWatch is that each of the various objects you use can be created in multiple ways: through the AWS Console, through command-line tools and through programmatic APIs. In a perfect world, you could do each action through any of the three mechanisms, but reality is far from perfect.
Before diving further into CloudWatch, it's important to understand all the terms you'll encounter with the tool. Here's a glossary to help you understand the world of AWS CloudWatch.
AWS CloudWatch metrics
The metric name as shown in the AWS console may not be the actual name you use programmatically. For example, the number of disk reads is shown on the console as "Disk Reads" but referred to as "DiskReadBytes" on the command line or in a Java program. This mapping is not available in the command-line help or Java Docs; it's only available in the CloudWatch Developer Guide.
Metric: A metric is a value you want to measure. It can be a built-in metric such as system load (on AWS EC2 instances, for example) or any value specific to your application. If you add a custom metric, you have to provide a way to set new values, which can be done using command-line tools and/or various APIs. A metric has at least a name and value but can optionally have units and dimensions.
NameSpace: AWS creates a lot of standard metrics, especially if you create a lot of transient objects such as AWS EC2 instances. Each metric lives for two weeks so you can easily end up with thousands of built-in metrics. If you hold any hope of finding your shiny new metrics, you need to put them into separate NameSpaces.
Figure 2 shows that my AWS system has tens of thousands of standard metrics and five custom metrics. Just as with metric names, built-in namespaces are not necessarily the names shown in the console. For example, the namespace "EC2" is shown in the console, while the actual name you must use is "AWS/EC2." The CloudWatch Developer Guide is the only definitive source for this information.
Alarm: An alarm is something that occurs when the value of a metric does something that you specify. It can be "monthly billing exceeds x dollars" or "Cassandra server count is less than six for three measurement intervals in a row."
Topic: This is a resource that gets notified when an alarm triggers. It is actually something you create in Simple Notification Service (SNS).
Subscription: A subscription is a location, such as your cell phone or email, that is connected to the topic and gives you the actual news your alarm has triggered.
ARN: An Amazon Resource Name, or ARN, is how Amazon Web Services creates names for resources so that they can be used across or between services.
To put these terms in context: You create a metric and periodically give it values. At some point, the value causes an alarm to trigger that sends a message to a topic. Then the email address or cell phone number with a subscription to the topic receives a notification.
As you might imagine, tracking down what went wrong if your metric's value changed and you didn't get a text message or email can be difficult.
Figure 3 shows how all these pieces work together. Two of the parts -- the metric and alarm -- are configured directly in AWS CloudWatch; the other two -- topic and subscription -- are configured in the SNS.
About the author:
Brian Tarbox has been doing mission-critical programming since he created a timing-and-scoring program for the Head of the Connecticut Regatta back in 1981. Though primarily an Amazon Java programmer, Brian is a firm believer that engineers should be polylingual and use the best language for the problem. Brian holds patents in the fields of UX and VideoOnDemand with several more in process. His Log4JFugue open source project won the 2010 Duke's Choice award for the most innovative use of Java; he also won a JavaOne best speaker Rock Star award as well as a Most Innovative Use of Jira award from Atlassian in 2010. Brian has published several dozen technical papers and is a regular speaker at local Meetups.