masterzphotofo - Fotolia

Why do we receive so many CloudWatch alarms?

We receive too many alarms from Amazon CloudWatch, which creates clutter and confusion. How can we identify and remove unnecessary CloudWatch alarms?

Identifying unused or unimportant alerts in AWS can be time-consuming and difficult, especially when an IT team...

creates hundreds of Amazon CloudWatch alarms.

IT teams must operate lean and efficiently. A single operations team needs to be able to automate provisioning and configuration for hundreds -- or thousands -- of machines. But, if done incorrectly, automation can create time-consuming tasks for IT teams. Events, such as automated alerts, which are intended to generate transparency into the environment, instead can create clutter and confusion.

The AWS Command Line Interface (AWS CLI) can help reduce the time and effort it takes admins to manually identify and remove AWS alerts.

Amazon CloudWatch alarms alert administrators when a metric falls outside of preconfigured levels. AWS administrators can create CloudWatch and be used with a variety of AWS utilities, including Amazon Simple Notification Service, Auto Scaling, AWS CloudTrail and Identity and Access Management. But while Amazon CloudWatch alarms can help admins detect underutilized Elastic Compute Cloud instances, for example, receiving too many alerts can be distracting. Finding and turning off noncritical alerts frees up valuable time.

There are three types of Amazon CloudWatch alarm states: OK, Insufficient and Alarm. The Insufficient state gives admins information about metrics and helps them identify unused or unimportant alerts.

For example, if an IT team sets dimensions based on an Auto Scaling group name and later deletes that group, those associated CloudWatch alarms will appear to be in an Insufficient state as a result of unknown data points. In this instance, the following AWS CLI command pinpoints which metrics are in an Insufficient_data state:

 $ aws cloudwatch describe-alarms --state-value "INSUFFICIENT_DATA"

This command also provides useful information, such as alarm name, state, reason behind the metrics and where to notify the alert. Based on the state reason value, a developer can determine whether or not to delete certain metrics.

In the following example, admins can determine why a particular data point is unknown:

Insufficient data: three data points were unknown.

To delete that data point, the admin would use the following AWS CLI command:

 $ aws cloudwatch delete-alarms --alarm-name awsec2-CPU-UTIL-HIGH

This command eliminates the alarm that sends alert notifications.

Next Steps

Create custom metrics to get the most out of CloudWatch logs

Use CloudWatch metrics to track AWS usage

Utilize CloudWatch logging to track resources

Dig Deeper on AWS CloudWatch and application performance monitoring