Problem solve Get help with specific problems with your technologies, process and projects.

Overcome legacy DynamoDB limits with a hot migration

Performing a hot migration in DynamoDB can be a stressful but often necessary process. Here are the steps to upgrade tables without causing downtime or service interruptions.

Amazon Dynamo has evolved quite a bit over the years. And early adopters of AWS' NoSQL database now face a few...

challenges when working around previously set DynamoDB limits.

When we first started with DynamoDB, our company indexed the table by publication and then story ID, which allowed me to quickly look up all stories in a publication. However, this setup also incurred a limit of 10 GB per publication.

Several years later, AWS notified me that I had a publication with too much content, and had I to scramble to figure out what was going on. As it turned out, I just had one publication with two times more content than the next biggest publication. But I didn't want to run into these DynamoDB limits again.

My mistake was that I chose a Publication ID as the primary hash key instead of a globally unique, randomly distributed ID for the primary table key. Doing so would have allowed me to add Global Secondary Indexes for other items that I wanted to search on. I also could have indexed everything in another system like Amazon CloudSearch or Algolia.

Amazon later added support for secondary indexes -- without the 10 GB limit -- but it was too late for me to start over with new keys and identifiers. I needed to fix my problem table and support a system in production that had to remain up while migrating to the new table structure.

One way to do this is through a hot migration -- adapting tables without any downtime to the system. During a hot migration, there are two actively used DynamoDB tables at the same time and multiple sources of record. If two systems write to the same record but in different tables, it causes referential integrity issues. Hot migration also takes a lot longer to support, but it was ultimately necessary to support a production environment.

Hot migration of DynamoDB tables

DynamoDB doesn't allow users to change or delete Local Secondary Indexes after creating a table. It doesn't allow you to change the primary key of the tables either. My table had active writes multiple times per second, so I couldn't delete it and recreate it with a new table structure. Instead, I created a new table structure to transition my code to write to it while reading from the old and new tables. My code hit DynamoDB in multiple places, which meant that I had to migrate all of my code before I could disable the old table.

But disabling the old table wasn't a priority, I just needed to make sure the new writes worked properly. As a temporary stopgap, I cleaned up the old table and removed stories that were six months old or older from the problem publication. This bought me enough time to perform a hot migration without losing any new content.

Convert apps to read from both tables

Before allowing applications to write exclusively to the new table, I needed to update the apps so they could read from multiple tables. I use Algolia as a search engine, so I didn't need to support a new search method, only a new ability to look up records. I searched through all source code for any DynamoDB queries or lookups against my Abstracts table.

Once I located a reference to the Abstracts table, I modified the code to read first from the new table, which I renamed Articles. After reading, the code checks the old Abstracts table if the reference didn't exist in the new Articles table. Initially, this would result in failed lookups and double-lookup operations. But once write operations shifted to the new table, it becomes the primary source of data. If a write operation occurred on the new table but not the old one, the old table would be out of date. Therefore, I needed to use the new table as the primary authority for any record.

Here is how the source code looks:

function lookupAbstract(params) {

 

docClient.get({

  TableName: ‘Articles',

   Key: {

     guid: params.guid,

   },

}, function(e, r){

    if(e){

      throw new Error(e);

    } else if(r.Item){

      callback(r.Item);

    }

    docClient.get({

      TableName: ‘Abstracts',

       Key: {

          pub: params.pub,

          external_id: params.external_id,

       },

    }, function(e, r){

      if(e){

        throw new Error(e);

      } else {

        callback(r.Item);

      }

    });

});

 

}

This code attempts to look up an item by the GUID on the Articles table. If that's not available, it uses the pub and external_id on the old Abstracts table. Previously, I needed to supply a pub ID and external ID, but this call requires me to make sure all calls to this function also supply a GUID. This means changing any references to my lookupAbstract call to include this extra data.

Migrate writes to the new DynamoDB table

After verifying everything is properly being read from new and old tables, I wrote a few test records to the new table that didn't exist in the old table. Then I started switching all processes that wrote content to the old Abstracts table to migrate them to the new Articles table. This was much simpler than the first step, because they didn't need any fallback operations to make writes to the old table -- with one exception.

When doing updates, as opposed to save operations, I needed to do a lookup operation. If the item didn't already exist in the new table, copy it from the old table first using this code, which checks to see if the record exists by verifying that a pub value is set.

function updateAbstract(key, updates){

 docClient.update({

  TableName: ‘Articles',

  Key: {

   guid: key.guid,

  },

  AttributeUpdates: updates,

  // Make sure this record exists in

  // the new table first

  Expected: {

   pub: {

    Exists: true,

   },

  },

 }, function(e, r){

  // Only copy from the old table if

  // this is a Conditional Exception

  if(e.code === ‘ConditionalCheckFailedException'){

   copyItemFromOldTable(key)

    .then(updateAbstract.bind(this, key, updates));

  } else {

    // Standard handling here…

  }

 });

}

If this attempt fails, another function is called to copy the old item to the new table and then retries the operation.

Copy all data to new table

Once new writes occur on the new table, verify that no new writes occur on the old table. To do this, subscribe an AWS Lambda function to all write operations on the Abstracts table and verify that none were missed. The final step is to copy any old data from the Abstracts table to the new Articles table, allowing me to eventually remove the old Abstracts table. Leaving the old table around doesn't cause any damage, but it could cost more money and cause confusion among other developers who access the content.

I used a simple script to copy every item from the old table to the new one, making sure there wasn't already a new record in the Articles table before performing writes. I verified that everything came through and moved along to the cleanup stage.

Cleaning up

Before deleting the Abstracts table, remove all references to it in the code. If the code failed to find an article in the Articles table, I logged it. However, I didn't want my code to look up the record in a table I was about to delete. I removed those second steps and replaced them with an alert so I could find and recover lost content.

Finally, before deleting the Abstracts table, I took a snapshot of it to store in Amazon Simple Storage Service, in case something went wrong. I then reduced the throughput of my table to the bare minimum and monitored the table for a few weeks before finally deleting it.

When starting from scratch, keep DynamoDB limits in mind. You may need to perform a hot migration method one day to move one table to another. And, because it's more complicated than initially designing keys for scalability, only use a hot migration when absolutely necessary.

Next Steps

Keep database tables in sync with DynamoDB

DynamoDB and Lambda are better together

Manage services and be aware of DynamoDB limits

This was last published in March 2017

Dig Deeper on AWS database management

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What problems have you encountered during a hot migration in DynamoDB?
Cancel

-ADS BY GOOGLE

SearchCloudApplications

TheServerSide.com

SearchSoftwareQuality

SearchCloudComputing

Close