carloscastilla - Fotolia


CloudWatch Logs Insights won't replace third-party tools -- yet

Organizations struggle to gain insights from their deluge of log data. An updated CloudWatch service can help, but don't expect it to be a comprehensive tool at this stage.

Developers and ops teams can use CloudWatch Logs to debug their applications, but log analysis can quickly become complicated. AWS recently added CloudWatch Logs Insights to quicken that process -- though there are still ways its functionality can improve.

A search through log files and plot statistics is the most basic way to diagnose issues and dig deeper into potential application issues. The logs are the first place everyone looks whenever there's any sort of attack on an application. The same is true when you detect unexpected behavior, such as a sudden increase in 500 errors or customers getting kicked out of your application for unknown reasons.

But logs have been notoriously expensive to parse at scale and often take a significant amount of time to sort through. This is especially true if they come from multiple sources, some of which -- such as an API gateway -- might be out of your control. Prior to this latest update to CloudWatch, developers could use CloudWatch Logs to set up analytics based on preconfigured queries, but they couldn't dig into logs that were already stored to plot statistics.

For example, I log how long it takes to deliver a file from when it's received on our FTP servers to when it's delivered to a customer's FTP servers. These logs come from legacy systems that run on EC2 instances and output JSON-formatted logs to CloudWatch Logs. While that lets me see how long it takes for a specific file to deliver, it doesn't let me plot the average delivery time, unless I set up a filter in advance. Because of this limitation, I've always recommended that logs be sent from CloudWatch Logs to a third-party log aggregation platform, such as Loggly or Logstash.

However, with CloudWatch Logs Insights, these features now become native to CloudWatch Logs, so developers can quickly build ad hoc queries. For example, I can promptly see how many stories of each publication type we've received in the last hour if I enter a simple query:

insights query
CloudWatch Logs Insights lets developers build ad hoc queries.

This works because I have a pub_type field already being parsed -- since it's sent in via JSON -- but it's also trivial to parse a message that's provided as text. For example, the following query parses a log line that includes a story ID, which is a publisher ID followed by a publication ID and a unique ID. I can parse this using the parse command and then graph how many stories for each publication were received, grouping the results into 30-minute intervals:

filter @message like /STORY Created/

| parse 'STORY Created *-*-* *' as publisher, publication, story_id, external_id

| stats count() by bin(30m), publication

This gives me a list of events that can be exported to Excel and graphed. At this time, however, CloudWatch Logs Insights does not support graphing anything other than basic time series graphs. You can filter for a specific publisher or publication but can't plot all of them on one graph. I can also use the
Visualization tab if I remove the , publication at the end:

insights graph
At this point, CloudWatch Logs Insights only offers basic graph functionality.

You can also add to CloudWatch Dashboards to easily identify key metrics at a glance in the future.

It's important to note that this is just the beginning for CloudWatch Logs Insights. The biggest takeaway here is that the analytics are fast. It's incredibly easy to perform complex math, such as plotting differences between two timestamps, right within the CloudWatch Logs platform. For example, I can easily find how long it takes for us to receive an article after it's published, by publication type, averaged over 30 minutes:

filter @message like /Indexing Story/

| stats avg(received_at - date) by bin(30m), pub_type

I could also filter to a specific publication and graph that information over time:

filter @message like /Indexing Story/

| filter publication = "XXXXXX"

| stats avg(received_at - date) by bin(30m)

Or I could identify any publications that have a number of stories delayed more than 15 minutes:

filter @message like /Indexing Story/

| fields (received_at - date) as delay_time, @message

| filter delay_time > 90000

| stats count() as delayed_stories by publication

| sort delayed_stories desc

This gives us a lot of power, because we can pipe from one command to another. You can use the output of the fields command to filter on a newly created field and the output of the stats command to sort by publications with the highest amount of delayed stories first.

What CloudWatch Logs is still missing

CloudWatch Logs is a lot more powerful than it initially may seem, but it's still lacking quite a bit in order to function as a full replacement to third-party systems, like Loggly. It lacks two major features, which AWS is well-aware of and has plans to fix in the near future: the ability to search across multiple log groups and visualizations beyond simple time series charts.

The ability to aggregate searches across log groups is the most glaring of the missing pieces. In order for any insights to be run, developers first need to pick a single log group. Since each Lambda function is a different log source, this means you can't easily search across all Lambda functions to identify a log line or trace logs from one function to another.

Additionally, Insights doesn't seem to have anything but basic visualization options. For example, you can't make bar charts or pie charts to see what your top publication is or build graphs that split publication types by date. While all of the raw data is available and can be exported into something like Excel, this means, once again, third-party integration has a leg-up on AWS.

Still, it looks like this release of Insights is just to get developers' first reactions, and AWS may soon add full support for a Loggly replacement in CloudWatch Logs.

Dig Deeper on AWS CloudWatch and application performance monitoring