BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
The AWS cloud is lowering the barriers to entry in data science and analytics. Access to scalable compute and storage resources are an obvious advantage of the public cloud, but there are other reasons to use the cloud for analytics. Using predefined Amazon Machine Images in the AWS Marketplace can reduce the time and effort needed to deploy and maintain an analytics environment.
The Amazon Web Services (AWS) Marketplace contains a wide range of cloud analytics tools for developers and analysts, ranging from data collection and ingestion tools to advanced analytics and visualization platforms.
Splunk Enterprise is useful for ingesting machine-generated data, such as logs. Splunk users may also be interested in Hunk, an analytics platform for Splunk. Hunk supports deployment of dashboards and can be used for analysis and visualization. Both Splunk and Hunk can be used in AWS under the bring your own license model.
Yhat Inc. is a data science platform that offers ScienceCluster, a distributed system for computationally intensive operations, and ScienceOps, a platform for deploying data science models in production environments. The Linux-based ScienceOps is available through the AWS Marketplace and supports Ruby, Python, Node.js and R clients, as well as a RESTful API. Yhat's ScienceOps is available under a bring your own license model.
Enterprises looking for established, well-supported platforms might want to consider Tibco JasperSoft Reporting and Analytics for AWS, which includes support for ad hoc querying, reporting, data analysis, visualization and dashboards. Tibco offers professional support to assist with planning and deployment. Costs range from $1.08 per hour for an m1.large server to $4.06 per hour for an m3.2xlarge.
Hadoop is almost synonymous with big data and cloud analytics tools. Amazon Elastic MapReduce (EMR) is a popular way to run Hadoop in the cloud, but it is not the sole choice for do-it-yourself Hadoop deployments. EMR offers several different packages of its Hadoop platform, including MapR Enterprise Edition Plus Spark, MapR Enterprise Database Edition, MapR Database Edition Plus and MapR Enterprise Edition Plus Spark. Prices range from $0.07 per hour to $0.97 per hour, depending on the EMR edition and the size of the machine instance. Machine instances for EMR range from m3.large to d2.8xlarge.
Some other options for analytics are not listed in the AWS Marketplace. For example, Databricks provides Spark with advanced security features and support for visualization tools and job pipelines. Databricks runs in AWS and integrates with AWS products, such as Identity and Access Management.
What's what in AWS big data analytics
Amazon Elastic MapReduce moves forward with Apache Spark