For most admins, data normalization is a key concept that comes to mind when they think of a relational database....
But for users of Amazon DynamoDB -- which follows a NoSQL, nonrelational database model -- normalization could also play a role.
In a relational database, normalization helps ensure data integrity, as it involves structuring data in such a way that it's not stored multiple times. To achieve this, admins store data in different tables and connect those tables via relationships.
Data normalization and DynamoDB
While the normalization process is largely associated with relational databases, there are exceptions. For example, there is a type of NoSQL database -- called a wide-column store -- that uses tables, rows and columns. To store unstructured data, the formats and names of the columns in these types of NoSQL databases can vary on the row level. In essence, a wide-column store NoSQL database -- including Amazon DynamoDB -- is a dimensional key-value store.
So, when would data normalization make sense with Amazon DynamoDB? Here are three examples.
1. To store large items
The maximum item size in Amazon DynamoDB is 400 KB. To store items larger than this, admins can either use an S3 bucket or a separate, normalized DynamoDB table. If they use the DynamoDB table, they can break the larger items into smaller chunks and then organize relationships between those chunks to re-create the item in an application.
2. Frequent data updates
Admins provision DynamoDB tables with capacity units for read and write operations. A capacity unit is defined as one operation -- read or write -- per second for an item up to 1 KB in size. If an organization constantly updates data, it will quickly consume the provisioned write units and will need to upgrade the limits -- which isn't cheap -- to avoid performance issues.
In some situations, an application might be slow and totally unreachable. If this is the case, update normalized data -- the smaller, necessary fields -- rather than unstructured data, as Amazon DynamoDB calculates updates based on the entire item, not the portion of that item that needs updates.
3. Expected application behavior
If admins can organize their application data into separate tables for frequently accessed versus not frequently accessed, they can apply data normalization to them and save money with different read and write capacity unit configurations. This isn't easy in most modern web and mobile applications, but admins can monitor how an application uses Amazon DynamoDB to help optimize performance and cut costs.
Of course, planning is key to any good database design, so an organization should review official AWS documentation on relational modeling in Amazon DynamoDB before it makes any unnecessary changes to a table's layout.
Dig Deeper on AWS database management
Related Q&A from Ofir Nachmani
Get a cloud expert's take on the technical factors involved in the Capital One data breach that exposed sensitive data of millions of the bank's ... Continue Reading
While Amazon CloudFront can make traffic spikes more manageable, IT teams still need to carefully prepare their environment for these increases in ... Continue Reading
Some AWS users should consider a third-party tool to find better visibility into their network infrastructure and traffic patterns instead of relying... Continue Reading