Download Presentation

Designing MapReduce Algorithms

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Also the 460 could come from many mappers, many documents over the entire corpus.

These co-occurrences from every mapper are delivered to “corresponding reducer” with a special key

This is delivered as special key item < (wi, *) , count> as the first <k,v> pair

The magic is that reducer processes < (wi, *) , count>

At the reducer: Blue: reducer1/Orange: reducer 2

4 different reducers

Requirements

Emitting a special key-value pair for each co-occurring word pair in the mapper to capture its contribution to the marginal.

Controlling the sort order of the intermediate key so that the key-value pairs representing the marginal contributions are processed by the reducer before any of the pairs representing the joint word co-occurrence counts.

Defining a custom partitioner to ensure that all pairs with the same left word are shuffled to the same reducer.

Preserving state across multiple keys in the reducer to first compute the marginal based on the special key-value pairs and then dividing the joint counts by the marginal to arrive at the relative frequencies.

Lets generalize this

<(var34, left), value>

<(var34, right), value>

<(var34, middle), value> all delivered to the same reducer.. What can you do with this?