When working with a single server, DevOps processes often use local tools such as grep or tail to search through...
log files and identify problems when they occur. When working with cloud-scale applications, which may spread across hundreds of thousands of servers, logging into each of these servers to search for a log line isn't feasible.
DevOps teams can use log aggregation systems, such as Loggly, Logentries or the popular open source solution, ELK to search these files. Developers use ELK on AWS to aggregate logs across multiple servers and facilitate searching and plotting of both system and application logs to identify patterns and help troubleshoot individual issues.
An ELK stack is a combination of three programs -- Elasticsearch, Logstash and Kibana -- that create a popular log aggregation and analysis across multiple systems. At the core of an ELK stack is Logstash, a highly configurable data-processing engine specifically tailored to handle massive amounts of log data.
Logstash accepts data input streams from multiple methods, including syslog processes, direct TCP connections or basic file inputs. It uses configurable rules to process data and allows developers to output data into any number of search systems, the most popular of which is Elasticsearch.
Elasticsearch is popular because it supports searching through full text and flexible metadata; it also enables faceted searches without a predesigned schema. Developers can send almost any JSON-formatted document into Elasticsearch to index it for searching. Because a lot of log data is unstructured, this creates an excellent back-end search system for Logstash.
The Kibana user interface enables developers to search through log data stored in Elasticsearch. Logstash sends data to Elasticsearch and then Kibana makes that data accessible for analysis. Kibana supports searching, graphing and charting.
Setting up ELK on AWS
There are many ways to configure an ELK stack for AWS, but because Amazon enabled support for managed Elasticsearch, the simplest way is through Amazon Elasticsearch Service (ES). Amazon ES automatically configures an Elasticsearch cluster with a Kibana plug-in that's accessible through the web.
The first step to creating an ELK stack in AWS is to set up a new AWS Identity and Access Management (IAM) role for AWS Lambda called logstash, making sure to add the AWSLambdaBasicExecutionRole managed policy. Next, add an inline policy to give the IAM role access to Amazon ES to submit content. Use the following command to do so:
The logstash IAM role will stream data from CloudWatch Logs to the Elasticsearch cluster. Developers can stream all application and server logs to CloudWatch logs and then let the IAM role stream it to the cluster.
Visit the Elasticsearch console in the AWS Management Console and then set up a new Elasticsearch cluster, following the prompts with the desired configuration.
On the Set up access policy page, choose Allow or deny access to one or more AWS accounts or IAM users.
In the pop-up box, enter the IAM role created for Logstash and then choose OK.
This allows the Logstash stream to send data into Elasticsearch. To see data from Elasticsearch, add a specific IP address as an additional access statement. The developer can append a JSON statement like the following, replacing MY_IP_ADDRESS with the external IP address:
Follow through the remaining prompts to create the Elasticsearch cluster.
Streaming logs into Amazon ES
The Elasticsearch cluster takes a few minutes to initialize. While that process completes, head over to the CloudWatch console. Under CloudWatch Logs, choose a stream to send logs to the new Elasticsearch cluster. Click the checkbox next to any Log Group name and choose Stream to Amazon Elasticsearch Service.
On the next page, select the Elasticsearch cluster from the dropdown and the Lambda execution role for Logstash. This streaming connection is the Logstash in the traditional ELK stack implementation. For the format, choose JSON and click Next. Finally, verify the details and choose Start Streaming. This process enables developers to easily stream logs from microservices written as Lambda functions.
Most traditional log aggregation methods require a separate component to read the log files and stream data to a service. But developers using ELK on AWS simply stream logs directly from the default output of the Lambda function right into Elasticsearch.
Accessing the logs with Kibana
Two new links will appear on the Amazon ES console for the new ES cluster: one for the root of the Elasticsearch cluster and one for an automatically installed Kibana plug-in. Kibana enables developers to quickly search through logs, and provides several visualization tools to plot and identify patterns.
There is a small amount of setup work needed before Kibana becomes useful. Initially, the tool asks a developer to set up an index. Next, CloudWatch Logs submits log data to Elasticsearch in indexes formatted as cwl-YYYY.MM.DD. So, for the Index Pattern on the initial setup page, choose cwl-*.
The Time-field name must be set to @timestamp; then click create. Once this is set up, head over to the Discover tab to start browsing and searching logs.
What about Logstash?
Despite its name, Logstash actually doesn't store logs. The tool simply enables developers to parse logs. This setup uses CloudWatch logs to send log data directly to Elasticsearch, so Logstash isn't necessary. However, if you install Logstash on a server and have it send log data to the Elasticsearch cluster, you'll need to set up an additional index for that data. The Elasticsearch cluster can receive data from any number of inputs, but CloudWatch Logs is the preferred input, as it's simple to set up with AWS and isolates the logging mechanism from the software. This creates easy customization of log aggregation after services are already running.
Amazon ES manages search clusters
AWS open source tools offer risk and reward
AWS plans to integrate Elasticsearch onto EC2