The Amazon Web Services infrastructure is growing in popularity for deploying NoSQL back ends for enterprise applications,...
said Robert Treat, CEO of OmniTI, a Web architecture and engineering firm. Enterprise architects need to consider a variety of challenges and best practices for addressing them when developing a database architecture to get the most benefit from the cloud. These challenges include the ephemeral nature of Amazon Web Services, problems caused by noisy neighbors, subtle differences between the underlying Amazon Web Services hosting infrastructure, and differences between NoSQL software implementations.
Brian Bulkowski, founder and CTO of Aerospike, a database software provider, said his company decided to use NoSQL despite these challenges because AWS scales so well. The company recently was able to process 1 million transactions per second in RAM on one Aerospike database server running on a single Amazon C3.8x large instance for just $1.68 per hour. "That is an incredibly affordable model for achieving high performance, which we think presents an opportunity for startups and larger enterprises alike to roll out new Web and mobile apps that respond in real time," Bulkowski explained.
But Bulkowski found that one of the biggest challenges with Amazon is running on virtualized and shared infrastructure with noisy neighbors. This leads to inconsistent performance based on what other applications on the shared infrastructure are doing. Developers also face challenges around reliability, which requires maintaining servers in different data centers in order to ensure uptime.
The AWS platform provides a number of services to build database applications. The challenge for developers is to choose a proper combination of them, said Dan Skatov, head of development at Starcounter, a database software provider. For example, developers need to understand the differences between AWS SimpleDB and AWS DynamoDB, and be able to track these specifics at early development stages.
The ephemeral cloud
Another challenge with using AWS is that instances rely on ephemeral, high-performance storage, said Shane Johnson, senior product marketing manager at Couchbase, a NoSQL database software company. If an instance is stopped, its data is lost. Adding new instances to an existing deployment can be a manual process without proper tooling.
The Couchbase server can be configured to replicate data to multiple instances to ensure data is not lost in the event of a failover. Ansible and AWS CloudFormation can be leveraged to automate the addition of new instances to an existing deployment. The Couchbase server backup tools (cbbackup, cbrestore and cbepctl) can be leveraged to create snapshots of the data. The snapshots can be placed on persistent Amazon Elastic Block Storage volumes for disaster recovery.
One of the benefits of using AWS is that it provides a simple abstraction for developing and deploying applications. This can create problems for operations and performance because it can be difficult to see what is going on underneath the hood. Every cloud is heterogeneous in spite of the common assumption that distributed databases are running on non-specialized, identical commodity compute nodes, said Ofer Bengal, co-founder and CEO of Redis Labs, a database software service.
Tuning the system for the resources that have been provisioned is essential, but can be time-consuming. Every cloud is unique and offers different compute, storage, network and other services within it. Also it is important to understand that the AWS infrastructure is made up of distributed data centers that can differ between one another -- even within the same data center.
"To get an optimal result in each, you really need to know what you're doing," Bengal added. One way to address the differences in data centers is [to] solve them with internal enterprise IT resources. This is the recommended conservative route, Bengal said. Alternatively, a number of managed NoSQL services and products can help address these complexities as a service.
Automate when possible
One of the most important techniques for working within a turbulent environment like AWS is the use of heavy scripting and automation for setting up new environments, said OmniTI's Treat. This includes developing a tool set for recreating environments in non-production settings so developers can see how systems operate together.
Additionally, deep monitoring is recommended to help increase visibility into each of the different components of these architectures, especially for troubleshooting purposes. If the architectures had been designed correctly, it should be possible to roll new nodes and code updates in and out of the system, with full visibility of any impact those changes have. This capability helps engineers have confidence in the changes they are making and helps to minimize disruption to production services, Treat said.
Automation has always been a favorite technique of system administrators, and tools such as Chef and Ansible are taking this up a notch, giving systems engineers a common platform to develop on. Not only are these tools good for automation, they also provide de facto documentation, including working code snippets available on GitHub (in the form of code) and provide strong repeatability for creating new environments when needed.
On the monitoring side, new tools like Circonus and open source options like statsd (a daemon for statistics aggregation) and ganglia (a scalable distributed monitoring system), are focusing on making it easy for application developers to write more operable code, exposing metrics that can be consumed, alerted on and trended against for monitoring and troubleshooting issues. These tools integrate well with systems monitors to provide a more comprehensive picture of what is going on behind the scenes while minimizing impact on production systems.
Beware of the differences between NoSQL tools
There are now over 20 NoSQL databases, each with their own strengths and specialties. "It's very hard for a newcomer to distinguish which database to use for a particular need," said Matt Heitzenroder, COO and co-founder of Orchestrate.io, a database as a service provider that can run on AWS. Each database requires a level of expertise to build and manage effectively.
It is important to understand what questions need to be asked of the data. Most requirements are for full-text search, time-series (analyzing data based on time), graph structures, geospatial analysis and ad-hoc reporting. In addition to the query type, it is also important to determine which database has the most production use cases similar to the intended application. Another consideration is the risk of failure modes for each database, and how that will impact the business.