There are numerous ways to monitor the performance of Amazon cloud apps and the cloud itself, and options are available...
from AWS as well as third-party providers. There are three main aspects of Amazon cloud apps that are worth monitoring: logs, analytics and health checks. Each boasts its own set of tools.
Manage cloud logs
Log management is difficult when working with multiple servers. The complexity is compounded because new servers continually stop and start for common tasks, such as upgrading or scaling. It's no longer feasible to search individual machines and grep through logs. Additionally, if developers migrate their services to something such as Elastic Compute Cloud (EC2) Container Service, log management becomes even more complex.
There are two main types of logs: application level and system level. Both types are useful for identifying issues and are delivered via standard services, such as "Syslog" to Amazon CloudWatch Logs. CloudWatch Logs provides basic alerts and alarms; however, the service fails at complex things, such as graphing and retroactive searching. For real-time events -- such as monitoring for a specific log pattern -- Amazon CloudWatch Logs works well. But if you need to drill into what happened after the fact, a real full-featured log management system, such as Loggly or Logentries, is more appropriate. In this way, a hybrid approach of using both Amazon CloudWatch Logs and Loggly is preferable to just one tool.
Mining valuable data analytics with Amazon cloud apps
Logs are helpful for monitoring back-end processing, but what about things that can't be logged? For questions like, "Are we processing a normal amount of requests?" or "Do we have a backlog of request processing?" the best answer may be found in analytics data.
Amazon cloud apps such as CloudWatch offer effective tools for tracking metrics, but it can quickly become expensive to track everything end users do. CloudWatch is certainly a developer's tool, but the data inside is valuable and should be understandable to non-developers.
This is where third-party products, such as Klipfolio, come in. Klipfolio allows developers to integrate with nearly any data source available via API, including built-in tools for integration with SQL, Google Analytics, and sales platforms, such as Stripe. With the Kliprouter adapter, it's also easy to tie both CloudWatch and DynamoDB into Klipfolio, and its dashboards keep track of key performance indicators. For example, a dashboard can monitor Simple Queue Service queue counts, story counts and "next read" times.
Performance and availability of Amazon cloud apps
Amazon can offer health checks via Elastic Load Balancing and Route 53, but these checks are limited to pass/fail performance, not overall performance. Additionally, the checks offer no support for determining exactly why something is failing.
Tools such as those made from New Relic bring features that are more powerful than those within ELB or Route 53. With the addition of New Relic Synthetics, developers can also program high-level, end-to-end tests that check an application -- without any users accessing it. Combined with application performance metrics, developers can monitor how updates or changes to Amazon cloud apps affect performance. New Relic also provides alerting capabilities and real user monitoring to help monitor browser speed effect and front-end errors.
New Relic also enables developers to perform checks from multiple locations, so it's easy to tell if something within an application or a network glitch in a particular area caused an outage. It also allows developers to measure performance speeds throughout the world and to optimize for where end users are focused. For example, if there are several users in Europe, it may make sense to launch a cluster of servers there.
In addition to Route 53 and ELB health checks, Pingdom can be useful for availability monitoring. This tool offers high-level "is-this-service-running" type of checks and is a good secondary verification to ensure that everything runs smoothly. When it comes to uptime and availability monitoring, there's no such thing as too many checks.
When supporting a web-scale application, don't rely on just one avenue of monitoring. Redundancy and failover support are necessary in applications, and it should be the same with a monitoring platform. To combine these tools, use a log to accept alerts and have fallback alerts send email messages. Monitor dashboards regularly during business hours to ensure that everything is running smoothly. Use Amazon cloud apps and other AWS built-in tools when available, but don't be afraid to supplement them with outside sources.
Track hybrid resources with CloudWatch logs
AWS monitoring tools might fall short
There's an arsenal of performance, cost monitoring tools