NoSQL databases enable scalability and can help reduce development time for Web applications. Unlike relational databases with fixed schemas, many NoSQL databases are schema-less, which also gives developers more flexibility. Two popular NoSQL databases are Amazon Web Services' SimpleDB and DynamoDB.
Amazon DynamoDB and SimpleDB are fully administered nonrelational databases that provide simple application programming interfaces (APIs) to store, query and manage data. Both databases are suitable for applications that benefit from flexible database design, but they have several differences and applications.
Managing smaller databases with SimpleDB
SimpleDB is most appropriate for small databases that won't exceed 10 GB in each domain and require basic storage and query operations. If you expect to have tables that will grow larger than that and you plan to use SimpleDB, you'll need to partition your data into two or more domains and manage it yourself. While it's possible to manually partition data across domains, the additional management overhead undermines the benefits of SimpleDB.
The service is designed for small database applications where flexibility, availability and durability are key considerations, while scalability is less of a requirement. The ability to change the attributes of a table on the fly, without having to modify schema, re-index data or manipulate a table structure offline are good examples of flexibility. SimpleDB data spread across multiple data centers within a region provides availability and durability.
SimpleDB databases are organized around domains, which are analogous to relational tables. Domains contain multiple items, or sets of key value pairs. Think of items in terms of rows in relational tables, while keys and values are attributes and components of attributes, respectively. Data is added to domains and queried using a basic API or console.
SimpleDB supports a simple select statement that any SQL programmer would understand. There are significant differences, however. SimpleDB does not support joins across domains. If you need to combine data from multiple domains, you'd have to query and combine each using a custom program. This shouldn't be difficult for simple joins, but if your application needs to support more complex joins, use a relational database such as MySQL or PostgreSQL. Both are available through Amazon’s Relational Database Service.
An advantage of SimpleDB is that it indexes all items in a table, which is useful for applications that allow users to query on any item. Querying a customer table by last name, city, state or zip code would occur equally as fast since all of those items are indexed.
Large volume databases call for Amazon DynamoDB
Amazon DynamoDB is designed for more demanding applications that require scalable data stores and more advanced data management features. Instead of using hard disks, DynamoDB uses solid-state devices for constant, low-latency read-and-write times. It's designed to scale to large volumes while maintaining consistent performance, though that performance comes with a more restrictive query model.
Because DynamoDB works with larger enterprise databases, they may require additional data management services. AWS integrates DynamoDB with Elastic MapReduce (EMR) -- the AWS Hadoop service -- and Redshift, its data-warehousing service. Use Amazon Redshift or EMR for large-scale ad hoc querying or analysis and use DynamoDB for more targeted queries based on hash and hash-and-range keys. You can use DynamoDB to avoid the extra overhead of managing partitioned domains. DynamoDB has no size limits and manages data partitioning as needed.
DynamoDB indexes on primary keys and allows for secondary indexes. Both primary and secondary indexes are based on hash or hash-and-range keys. Instead of a single select statement, the service uses query and scan statements. Query statements are used with a primary or secondary hash or hash-and-range key. Scans read every item in a table, which offers more flexibility, but this operation can be slower than queries, especially on large tables. The responsiveness of your application is also determined, in part, by the read-and-write capacity provisioned for the DynamoDB database.
Developers can use DynamoDB Local to build and test code with a local database rather than via a live production database. The DynamoDB and DynamoDB Local APIs are compatible so code should run in both environments.
About the author:
Dan Sullivan holds a Master of Science degree and is an author, systems architect and consultant with more than 20 years of IT experience. He has had engagements in advanced analytics, systems architecture, database design, enterprise security and business intelligence. He has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail and education. Dan has written extensively about topics that range from data warehousing, cloud computing and advanced analytics to security management, collaboration and text mining.