A recent update to Amazon Aurora merges transactional and analytical database models to rev up results for some AWS users.
The Amazon Aurora Parallel Query feature, which was in preview since February and is now available, could double querying speeds for the relational database service. Aurora was previously limited to transactional queries, but this feature enables functionality that's typical of analytical databases for large numbers of transactions and complicated analytics.
"[Aurora is] starting to expand the envelope on the types of problems it can tackle," said Henry Cook, an analyst at Gartner.
In the analytical database model, parallelism breaks a query into many pieces and spreads them over many processors, which produces as much as 100-times faster querying speeds. Transactional systems typically have a one-to-one relationship between queries and processors, so parallel questions only increase the amount of requests to process at any given time, not the speed to resolve them.
Transactional databases, such as Aurora, are designed for simple queries that quickly summarize the results. Parallel processing is typically reserved for data warehouse platforms based on analytical databases, such as AWS' own Redshift data warehousing service and those from Teradata, Microsoft and Oracle.
Historically, databases have served very specialized purposes, but this Aurora feature is the latest example of vendors' attempts to marry different approaches under a single, distributed system. MemSQL, Oracle and Microsoft offer this same sort of dual functionality, but under the covers, they're actually separate engines that merge the results for the user, Cook said.
The Amazon Aurora Parallel Query feature is designed to tap into the processing power of AWS that spreads users' data across storage nodes that span availability zones. AWS plans to extend those scale-out concepts to other layers of the database stack, but don't expect enterprise Redshift users to ditch that service anytime soon, Cook said. They'd still want to use a data warehouse to run billions of queries, but Parallel Query could eliminate the need for new users to deploy both services -- particularly if they expect moderate to heavy queries on thousands or millions of rows of data.
The number of parallel queries users can run depends on the type of instance they use, with a maximum of 16 concurrent parallel queries available through the R4 Memory-Optimized 16xlarge instances. There are no additional fees for this feature, though its access to storage nodes may increase I/O costs. Aurora has an optimizer tool that automatically determines whether to run a parallel query, based on table size and memory, though users can override that tool.
Feature, functionality tradeoff
Jon HymanCTO, Braze
Braze, a global consumer engagement company based in New York, began to use Aurora this year with some of its email marketing tools because of its fast write performance, and it now wants to extend that use for internal workloads. The feature is too new to understand how it will work in some of those cases, but parallel queries could speed up processes if they hook up Aurora with Looker, a tool the company uses for business intelligence, said Jon Hyman, Braze's CTO.
Still, Braze uses the PostgreSQL flavor of Aurora, so it'll have to wait for that to be supported. And while the company could benefit from the Parallel Query feature, this latest update also raises concerns for Hyman.
"I've seen a number of databases continue to tack on new functionality while not focusing on the core behavior or even the behavior of the ecosystem around the database," he said.
This problem isn't unique to Aurora or AWS, but Amazon could have spent more time on the management aspect of the service, Hyman said. For example, Braze deployed its own PgBouncer load balancers to pool connections because they eat up a lot of RAM, but AWS should handle that type of issue, he said.
"The new features are nice. But, at our scale, we care a lot [more] about stability and availability," Hyman said.