freshidea - Fotolia


Open source cloud tools offer risk, reward with AWS

Logging AWS resources can be cumbersome, but is necessary to ensure nothing goes awry. Open source tools help aggregate and visualize AWS resource data.

Developers can use a variety of open source cloud tools to customize their cloud environments and maintain portability for a particular application or workload. But with hundreds of open source tools available choosing from similar services can be difficult.

Enterprises must first determine what type of open source cloud tools they need and what level of expertise they have for working with those tools. For example, which tools will enable a more efficient cloud operation or provide extra functionality for a native AWS product? Is the in-house operations team knowledgeable with one particular tool versus another? 

It's important to remember that, when working with open source tools, the level of support is lacking compared to that of a paid service or tool. Developers also need to run servers and store all the data, both of which cost money. But, in the end, going open source may be cheaper when considering the total cost of ownership to just use a paid service. In both cases, keep a close eye on other adopters and make sure the open source tool is updated regularly. If not, an IT staff may end up with something unsupported after integration.

Open source log aggregation alternatives

Once a number of instances are running in a production cloud environment, it's not feasible to dig through log files server by server. This is where log aggregation comes in handy.

Log aggregation services, such as Loggly, Logentries and Papertrail, can be useful, but sending all of your data to those services could come at a steep price. These services sometimes charge outrageously large amounts for things that should be relatively cheap, and their pricing often seems to have no real meaning at all -- a GB of "storage" could mean anything, because it's not clear how items are indexed.

Logstash and Kibana are two choices for developers looking for open-source alternatives to log aggregation for cloud instances. The most popular indexing database for Logstash is AWS Elasticsearch, so installing the ELK stack (Elasticsearch, Logstash, Kibana) could be helpful.

The OpenStack advantage

It's impossible to talk about open source and cloud computing without talking of OpenStack. Because OpenStack runs on-premises -- and is a common platform for hybrid public/private cloud architectures -- IT loses a lot of the capabilities of public cloud when it comes to saving upfront capitol and automatically scaling. Still, OpenStack is a strong choice for developers looking to start small or those who already have hardware to run new applications. 

Logstash and Elasticsearch require logs to be relatively well-formatted, and, at the very least, consistently formatted. For example, if you send your logs in JSON format, make sure you keep the keys consistent, like always using {"error": "Error Message"} and not occasionally using {"err": "Error Message"}. Complex objects such as {"error": { "code": 400, "msg": "Invalid Request"}} could drop some of those messages.

Be careful when crafting log messages and make sure everything lines up.

With Logstash, developers can parse generic data in date and data format, so system logs and items from Apache, Redis and more can all be piped into one central location and indexed. On top of Logstash and Elasticsearch, some developers will probably want to install Kibana to interact with data stored in Elasticsearch to produce dashboards, graphs and just generally search through logs. Alerts also can be set up directly through Logstash via output filters.

Keep in mind that running extra instances creates additional costs. However, when transferring data internally within a cloud provider, there are no data transfer costs and the developer also has full control over how much data is stored. For example, a developer could choose to store DEBUG information for a day, INFO logs for a week and WARNING and ERROR messages for a few months.

Open source dashboards

Kibana is a great dashboard suite for logs and other data pulled directly from Elasticsearch. However, for combining data from other sources, such as a NoSQL database, Amazon CloudWatch and StatsD processes on servers, the open source dashboard product Freeboard is a good option.

Freeboard monitors metrics in a cloud platform, providing an open source and a hosted product. Developers can get started using the hosted offering, and, if it turns out to be too expensive, move that product to hosted servers. Dashboards such as Freeboard and Geckoboard (a near-clone of Freeboard) help IT professionals gather information across multiple data sources to be presented to non-technical staff, such as marketing and sales teams. This type of information can be used to present high-level overviews on weekly staff meetings or to monitor the status of services and the overall health of the business.

These types of dashboards can also monitor service status, such as the number of errors users have encountered; tasks waiting to be processed; user conversion rates and other end-user information. By hooking into other analytical sources, business members could quickly view the most popular site content based on things outside of analytics databases -- without having to log into an analytics system.

Next Steps

Grunt, open source tools team with Lambda

Open source project serves as ELB alternative

Netflix open source software plugs AWS security gaps

Dig Deeper on AWS tools for development