Definition

Amazon Elastic MapReduce (Amazon EMR)

This definition is part of our Essential Guide: An insider's look at AWS re:Invent 2014
Contributor(s): Matthew Haughn

Amazon Elastic MapReduce (EMR) is an Amazon Web Services (AWS) tool for big data processing and analysis. Amazon EMR offers the expandable low-configuration service as an easier alternative to running in-house cluster computing.

Amazon EMR is based on Apache Hadoop, a Java-based programming framework that supports the processing of large data sets in a distributed computing environment. MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers. It was developed at Google for indexing web pages and replaced their original indexing algorithms and heuristics in 2004.

Amazon EMR processes big data across a Hadoop cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3). The elastic in EMR's name refers to its dynamic resizing ability, which allows it to ramp up or reduce resource use depending on the demand at any given time.

Processing big data with Amazon EMR

Amazon EMR is used for data analysis in log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, bioinformatics and more. EMR also supports workloads based on Apache Spark, Presto and Apache HBase -- the latter of which integrates with Hive and Pig for additional functionality.

See a video introduction to Amazon EMR:

This was last updated in January 2017

Continue Reading About Amazon Elastic MapReduce (Amazon EMR)

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What issues have you experienced with setting up and managing big data compute workloads?
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

SearchCloudApplications

TheServerSide.com

SearchSoftwareQuality

SearchCloudComputing

Close