AWS says its latest database feature is as close as IT shops can come to having an "undo" button. Users, however,...
should tread lightly.
Amazon Aurora customers can now "backtrack" their relational database to nearly any chosen point in time, up to 72 hours. The feature is designed to erase a particularly egregious error and reduce downtime in comparison to rollbacks with more traditional database systems. AWS cites several examples of serious mistakes that this tool could remediate, such as a typo, or if administrator neglected to add a WHERE clause or dropped the wrong table.
Virtually all enterprise-grade systems have rollback capabilities, such as point-in-time-recovery and continuous backup features, as does Amazon Aurora. What's different about backtracking is that it's not dependent on a backup storage system. Depending on the size of the database, restoration through traditional approaches can take hours to complete, as opposed to seconds or minutes with this new feature.
Amazon Aurora's storage system generates a log record every time it makes a change to a cluster, but it does not override previous data. The backtrack feature creates a FIFO buffer from those sequenced logs, and if a user goes back too far, they can push the cluster forward in time to a more appropriate version.
The backtrack feature shouldn't replace snapshots or fuller database backups, and users should do as much as possible to automate systems and limit administrators' activities around production databases, said Erik Peterson, CEO and co-founder of CloudZero, a serverless reliability management company that focuses on AWS. Cost considerations should be taken into account, but this tool can be a great option for overcoming potentially catastrophic mistakes, he said.
"That has huge ramifications for being able to roll back the clock and turn a mistake that could have been a full-blown outage into a blip on the radar," Peterson said.
For now, the feature is limited to newly launched Amazon Aurora clusters or clusters restored from a backup, but users can add it to existing frameworks either via API or the AWS Command Line Interface. Backtrack does not currently support PostgreSQL.
AWS charges $0.012 per 1 million change records per hour, so a company's cost depends on the number of change records it produces and how far back it goes.
A worst-case scenario database tool
During a backtrack, the application must be turned off, with any open connections lost and uncommitted writes dropped. Once the application completes the rewind, it can resume normal operations and accept request. That's why it's a "very useful feature that no one should ever want to use," according to Lee Atchison, senior director of strategic architecture at New Relic, a software analytics company and AWS partner.
"The biggest value is on testing and developing databases," he said. "I would be scared to death to be any way, shape or form risk-dependent on it to protect my production database."
Lee Atchisonsenior director of strategic architecture, New Relic
The nature of the backtrack feature means there will be lost transactions and customer impact, so the better risk management approach would be for administrators not to touch production databases and only use verified, tested code, Atchison said.
Braze, a global consumer engagement company, tested Amazon Aurora for a new workload it will soon move into production. The company handles millions of events a minute and interacts with billions of external applications for personalized messaging. For a company where seconds' worth of data matters, an undo feature with such specificity has some appeal, but they'd never want to use it in production, said Jon Hyman, CTO and co-founder.
"As soon as we start using [backtrack] we're going to have an outage," he said. "It's not something we would want to do willy-nilly."
The types of mistakes that AWS cites for the backtrack feature aren't the types of actions that should be made in production, Hyman added. For this reason, he sees the tool as best suited for test and development, where these types of mistakes can happen, and for forensics on a secondary system. He would like to backtrack just on reads replicas so SREs can investigate what caused an issue as the application continues to run.