Warakorn - Fotolia
Many enterprises have adopted Amazon Redshift hosted data warehouse for big data analytics projects. Is it a good...
fit for this type of technology and is it right for your organization?
Amazon Redshift is primarily designed for very large data warehousing and analysis, where it can serve as a cost-effective alternative to an on-premises data warehouses. But the reverse is also true; Amazon has other options for data storage that might be better suited to the tasks smaller organizations require.
Redshift has a good reputation among users. Its fast performance typically results in a positive Amazon Redshift review, according to Michael Fauscette, chief research officer at G2 Crowd, an online software review platform. "Some users also feel that [Redshift] is a better solution for large enterprise than for smaller businesses, where performance was more important than scalability," Fauscette said. Overall, Redshift is well suited and integrates with popular analytics tools. And its columnar database format is preferred for managing massive data sets for query, Fauscette added.
Redshift is a powerful product for getting data warehouses up and running in the cloud, said Laith Al-Saadoon, Lead Senior Solutions Architect at CorpInfo, said. "Redshift performs at its best with carefully designed queries and schemas," he said.
Despite its enterprise benefits, Redshift also has its quirks. One of the most common complaints involves how it handles large updates. In particular, the process of moving massive data sets across the internet requires substantial bandwidth. While Redshift is set up for high performance with large data sets, "there have been some reports of less than optimal performance," for the largest data sets, Al-Saadoon said.
Redshift also lacks a single distribution key; queries that require joins against multiple columns can suffer in performance.
In addition, the service has some limitations with data sources and concurrency. Redshift isn't compatible with all databases, most notably MongoDB and non-AWS cloud databases. And the concurrent query limit in Redshift is low relative to the requirements of some analytics tools.
Enterprises have some other Redshift concerns, including:
- The difference between versions of PostgreSQL and the version Amazon uses with Redshift were a concern for one G2 Crowd reviewer. "There are enough dissimilarities with PostgreSQL that it led to some 'gotcha' moments for us. For instance, the default for 'VACUUM is FULL,' whereas on PosgreSQL it's 'STANDARD' (does not recalculate all indexes). This led to some serious I/O hits that we were not expecting," one anonymous G2 Crowd reviewer noted.
- The scalability of very large data volume is limited and performance suffers, noted another Amazon Redshift review. "[Redshift] has a hard time scaling when you get to big data levels. We have had to migrate some of our datasets into a Hadoop/Spark cluster to get the performance we needed."
- According to one Amazon Redshift review, the query interface is not modern. "The 'Queries' interface is a bit behind. And their metrics tracking of COPY command is misleading. My experience with COPY is longer than it's reported on the dashboard," noted the reviewer.
- Redshift needs more flexibility to create user-defined functions. Although others have offered kudos to AWS for enhancing this capability, at least one Amazon Redshift review expressed disappointment, noting that "upgrading/adding new nodes to the existing cluster takes a long time and is not a simple task."
- Because of the nature of managed services, access to the underlying operating system and certain database functions and capabilities aren't available, added Patrick Hannah, vice president of engineering with CloudHesive, a cloud services consulting firm and managed services provider based in Fort Lauderdale, Fla.
- Starting sizes may be too large for some use cases, Hannah noted. Amazon defines these elements based on multiple factors relating to the particular configuration a company requires. On the other hand, Redshift is limited because it does not automatically scale and has generous but finite scaling limits. For instance, according to Amazon's website, the maximum number of tables that you can create per cluster is 9,900 and the maximum number of user-defined databases you can create per cluster is 60.
Test your knowledge about Amazon Redshift
Test your knowledge of Amazon Web Services' fully managed data warehouse service with this 10-question quiz.
Redshift also resides in a single AWS availability zone, though that doesn't affect all customers. "Given the nature of data warehouses, this may not be a deal-breaker," Hannah added, as the service is available in most regions.
Lastly, there is a human challenge. Redshift uses SQL-based database infrastructure, which means it uses a fairly standard tool set, said Vadim Vladimirskiy, CEO of cloud hosting company Nerdio based in Chicago. That means these skills are largely transferrable; however, many small- and medium-sized businesses will need to invest in training or hiring talent to manage and configure Redshift.
Replication improves Redshift, MySQL query performance
User-defined functions add to list of Redshift features
Third-party tools improve upon Redshift weaknesses