Andrea Danti - Fotolia


Working with graph databases and AWS

Facebook and LinkedIn use graph databases for social applications. But enterprises should look into how a database like Titan can work with AWS.

Titan is a graph database that runs on top of several databases that Amazon Web Services supports. Many people think graph databases are only useful for social applications, such as Facebook or LinkedIn. But Titan is also the primary database Amazon's Kiva Systems uses to manage its retail warehouses. And because the Amazon warehouse system is arguably one of the largest in the world, it's worth looking at how the retail giant uses the technology -- and how enterprises could incorporate it into an existing AWS deployment. Titan can go way beyond social apps and warehouses.

Graph databases work with many applications beyond social apps that manipulate lists of "friends." Many apps can make extensive and natural use of graph-style relationships. For example, several recommendation systems embedded in social applications tend to be graph-based systems. Graph databases encompass a series of nodes and edges; each node represents an entity and each edge represents a connection or relationship between two nodes. Graph databases, particularly Titan, generally offer easy installation and integration.

Titan is a NoSQL-based database. NoSQL is an umbrella term that refers to all non-relational-style databases. Within that umbrella are several distinct models -- each with its own set of strengths and weaknesses. Cassandra, another NoSQL-based database, is a natural fit for time-series data, but is poorly suited for ad hoc queries against networks of related nodes. In such an instance, a graph database is a natural fit.

Titan components and storage engines

Technically, Titan isn't a database; it's a client library that sits on top of the database. It relies on an underlying storage engine, such as Cassandra or Hadoop, to store its data. It also relies on an indexing engine, such as Lucene, ElasticSearch or Solar, to perform range-based queries. Therefore, as long as you have these technologies in your stack, you can add Titan on top of them; you don’t actually need to deploy another distributed database system. This reduces overhead and can speed the adoption of a new technology.

Titan requires a storage engine because that is where the nodes and edges are stored. Both Cassandra and HBase run on AWS and support big-data-style scaling. Amazon Relational Database Service and Aurora are two storage engines that are absent; however, many AWS users at a recent Boston AWS Meetup asked Amazon to add Aurora support to Titan. An index back end such as ElasticSearch, Lucene or Solr is optional for regular operation, but it's necessary to perform range-based queries. Once you download Titan, you receive a configuration that runs an embedded storage engine and index back end.

Developers can add properties and semantics to edges, such as defining direction and cardinality. Properties allow developers to search for specific types of relationships; direction and cardinality allow them to enforce domain semantics on the data.

Getting started with Titan graph databases

For enterprises planning to use Titan on AWS, possibly the best way to start is to draw an application’s main data structures on a whiteboard and then use the Gremlin command-line tool to create nodes and edges in your diagram. At that point, you can play with the Gremlin query interface and potentially discover that a graph-oriented approach simplifies your queries.

Another option is to use Titan’s built-in database, "The Graph of the Gods." You can load a database with a full set of "gods" and "relationships" and then run queries against that database. To do this, start gremlin and then run gremlin> GraphOfTheGodsFactory.load(g)

You can also run all types of queries, such as:

saturn = g.V.has('name','saturn').next() to find a particular node or'father').in('father').name to find Saturn's grandchild (Hercules) or

hercules.out('father','mother')*.getVertexLabel() to find Hercules' parents.

About the author:
Brian Tarbox has been doing mission-critical programming since he created a timing-and-scoring program for the Head of the Connecticut Regatta back in 1981. Though primarily an Amazon Java programmer, Brian is a firm believer that engineers should be polylingual and use the best language for the problem. Brian holds patents in the fields of UX and VideoOnDemand, with several more in process. His Log4JFugue open source project won the 2010 Duke's Choice award for the most innovative use of Java; he also won a JavaOne best speaker Rock Star award, as well as a Most Innovative Use of Jira award from Atlassian in 2010. Brian has published several dozen technical papers and is a regular speaker at local Meetups. Brian also writes for LinkedIn Pulse News.

Next Steps

How Amazon's Neptune is changing the graph database world

Value of graph databases goes beyond social media usage

Amazon surfaces its Neptune graph database as a managed cloud service

Dig Deeper on AWS database management