Step 10: Name your MapReduce job and specify if it is your own application or an sample application. In this instance it we will be creating our own MapReduce and select our Job Type as Streaming.

Why Streaming: A Streaming job flow runs a single Hadoop job consisting of map and reduce functions that you have uploaded to Amazon S3. The functions can be implemented in any of the following supported languages: Ruby, Perl, Python, PHP, R, Bash, C++.

Step 11: Click on ‘Continue’ to specify the parameters for your MapReduce job:

Here is an explanation of the parameters:

Input location: location to your dataset

Output location: Where you want your MapReduce output be put.

Mapper: Location of your Mapper, make sure to add the quotation marks around your statement.

Reducer: Location of your Reducer, make sure you add the qoutation marks around your statement.

Extra Args: No need for extra arguments in this case.

Step 12: Configure your Amazon Elastic MapReduce instance

Note: In my instance I specified the small instance with 1 instance count as this job is small. For large jobs select the Large or XLarge instances.

Note on Usage: Amazon bills you per hour for usage. So if your job runs for 2 min your will be billed for an hour. So the best is to run a few jobs in an hour time frame.

Going forward look the 3 options: On Demand Instance, Reserved Instance or Spot instance. Depending on your usage and the demand on AWS different options may be the best option for you.

Step 13: Specify advance options

If you have any specific key pairs created you can select them here. Also add an directory for AWS to place your log files. This directory will be automatically created.

Note: Bootstrap Actions is a feature in Amazon Elastic MapReduce that provides users a way to run custom set-up prior to the execution of their job flow. Bootstrap Actions can be used to install software or configure instances before running your job flow.

In this case we don’t need any bootstrap action to take place.

Step 15: Review your configuration

Click ‘Create Job Flow’ button to start your mapreduce job.

You will see your job will move from State: STARTING

CLUSTER SHUTTING DOWN

COMPLETED

Step 16: Check the result from your Map Reduce Job, open the meanVar001Log file as specified as the Output Location.