Resizeable Clusters

When you run your Hadoop cluster on Amazon EMR, you can easily expand or shrink the
number of virtual servers in your cluster depending on your processing needs. Adding
or removing servers takes minutes, which is much faster than making similar changes
in clusters running on physical servers.

Pay Only for What You Use

By running your cluster on Amazon EMR, you only pay for the computational resources you
use. You do not pay ongoing overhead costs for hardware maintenance and upgrades and
you do not have to pre-purchase extra capacity to meet peak needs. For example, if
the amount of data you process in a daily cluster peaks on Monday, you can increase
the number of servers to 50 in the cluster that day, and then scale back to 10
servers in the clusters that run on other days of the week. You won't have to pay to
maintain those additional 40 servers during the rest of the week as you would with
physical servers. For more information, see Amazon Elastic MapReduce
Pricing.

Easy to Use

When you launch a cluster on Amazon EMR, the web service allocates the virtual server
instances and configures them with the needed software for you. Within minutes you
can have a cluster configured and ready to run your Hadoop application.

Use Amazon S3 or HDFS

The version of Hadoop installed on Amazon EMR clusters is integrated with Amazon S3, which
means that you can store your input and output data in Amazon S3, on the cluster in HDFS,
or a mix of both. Amazon S3 can be accessed like a file system from applications running
on your Amazon EMR cluster.

Parallel Clusters

If your input data is stored in Amazon S3 you can have multiple clusters accessing the
same data simultaneously.

Hadoop Application Support

Save Money with Spot Instances

Spot Instances are a way to purchase virtual servers for your cluster at a
discount. Excess capacity in Amazon Web Services is offered at a fluctuating price,
based on supply and demand. You set a maximum bid price that you wish pay for a
certain configuration of virtual server. While the price of Spot Instances for that
type of server are below your bid price, the servers are added to your cluster and
you are billed the Spot Price rate. When the Spot Price rises above your bid price,
the servers are terminated.

AWS Integration

Amazon EMR is integrated with other Amazon Web Services such as Amazon EC2, Amazon S3, DynamoDB,
Amazon RDS, CloudWatch, and AWS Data Pipeline. This means that you can easily access data stored in AWS
from your cluster and you can make use of the functionality offered by other Amazon
Web Services to manage your cluster and store the output of your cluster.

Instance Options

When you launch a cluster on Amazon EMR, you specify the size and capabilities of the
virtual servers used in the cluster. This way you can match the virtualized servers
to the processing needs of the cluster. You can choose virtual server instances to
improve cost, speed up performance, or store large amounts of data.

For example, you might launch one cluster with high storage virtual servers to
host a data warehouse, and launch a second cluster on virtual servers with high
memory to improve performance. Because you are not locked into a given hardware
configuration as you are with physical servers, you can adjust each cluster to your
requirements. For more information about the server configurations available using
Amazon EMR, see Choose the Number and Type of Instances.

Management Tools

You can manage your clusters using the Amazon EMR console (a web-based user interface),
a command line interface, web service APIs, and a variety of SDKs. For more
information, see What Tools are Available for Amazon EMR?.

Security

You can run Amazon EMR in a Amazon VPC in which you configure networking and security rules.
Amazon EMR also supports IAM users and roles which you can use to control access to your
cluster and permissions that restrict what others can do on the cluster. For more
information, see Configuring Access to the Cluster.