WavebreakmediaMicro - Fotolia

Amazon DynamoDB helps IT shop track S3 objects

Amazon's NoSQL database as a service is used as an index for more than a million S3 objects at Robert Half International.

Amazon's database as a service is being put to use by an international staffing firm to keep track of objects in...

Simple Storage Service.

As a NoSQL database, Amazon DynamoDB is a key-value store that has no schema, unlike traditional relational databases. This makes it easier to scale resources dynamically, according to James Fogerson, senior solutions architect for Robert Half International, Inc., based in Menlo Park, Calif.

"We chose DynamoDB because we weren't sure when we started the process how much data we'd be ingesting," Fogerson said. "We can add capacity or throughput as needed."

The first use for Amazon DynamoDB at Robert Half is to keep track of Simple Storage Service (S3) objects associated with a particular application.

"Mostly it's pointers to S3 buckets, the metadata for S3 storage," Fogerson said. This doesn't constitute much data -- half a gigabyte total -- but there are 1.5 million of these records, making it impractical for them to be tracked using human means. Looking for data in S3 is very time consuming and expensive, Fogerson said, and any significant amount of data put there needs a place to track metadata to search for it quickly.

"Since it's just pointers, it's a relatively simple collection of data, it's not like we have to do complex joins or anything like that," he said.

Robert Half could have chosen to manage a different NoSQL database, such as MongoDB on EC2 instances. But Fogerson said DynamoDB was easy to deploy.

"We're not a NoSQL shop and trying to set up, configure and manage our own NoSQL environment was just a lot of expense and time that we didn't have," Fogerson said.

This is just the beginning for Robert Half and Amazon DynamoDB -- a number of other projects within the company are considering it as a data store. Overall, the company manages more than 1,000 instances in EC2 and has tens of millions of objects in S3.

Amazon DynamoDB limitations

Meanwhile, there are a couple of items on the wish list for Amazon DynamoDB. There are tools on GitHub for autoscaling DynamoDB. DynamoDB backup has also required some help from support to get right.

"Even for our small database, it was taking me 24 hours to export it to S3 and then to import it was taking 17 hours," Fogerson said. "I had to work with support to figure out the levers and knobs I needed to change to get it to work right."

For example, there's a percentage of throughput a user can devote to import and export, and to speed it up for the import required setting this percentage at over 100%, which was not intuitive, Fogerson said.

Amazon could also make S3 more natively searchable, in theory, and its job listings hint at just such a project. But DynamoDB also offers streams, which can kick off processes based on new records being inserted into DynamoDB. For example, a new record being put into S3 and written to DynamoDB could potentially kick off an AWS Lambda function to notify someone or process the record.

"Dynamo's going to have more capabilities than we could get out of S3," Fogerson said.

Beth Pariseau is senior news writer for SearchAWS. Write to her at [email protected] or follow @PariseauTT on Twitter.


Dig Deeper on Amazon S3 (Simple Storage Service) and backup