The figure below shows the various steps that the Hadoop MapReduce framework takes after your map function emits a key/value output record. Please note that this figure represents what’s happening with Hadoop versions 1.x and earlier - in Hadoop 2.x there have been some changes which will be discussed in a future blog post.

My book Hadoop in Practice (Manning Publications) in chapter 6 discusses how some of the configuration values in the figure should be tweaked when you start working with mid to large-size Hadoop clusters.

About the author

Alex Holmes is a senior software engineer with over 15 years of experience developing large scale
distributed Java systems. Since 2008 he has gained expertise in using Hadoop to solve Big
Data problems across a number of projects. He is the author of
Hadoop in Practice, a book
published by Manning Publications. He has presented at JavaOne and Jazoon.

If you want to see what Alex is up to you can check out his
work on
GitHub,
or follow him on
Twitter or
Google+.