Intro to Map-Reduce Feb 21, 2014. map-reduce? A programming model or abstraction. A novel way of thinking about designing a solution to certain problems…

Similar presentations

Presentation on theme: "Intro to Map-Reduce Feb 21, 2014. map-reduce? A programming model or abstraction. A novel way of thinking about designing a solution to certain problems…"— Presentation transcript:

1
Intro to Map-Reduce Feb 21, 2014

2
map-reduce? A programming model or abstraction. A novel way of thinking about designing a solution to certain problems… Feb 21, 2014CS512 | Spring 2014

8
Long long ago… LISP, 1958 A programming language that introduced several innovative ideas Feb 21, 2014CS512 | Spring 2014 Recursive Functions of Symbolic Expression and Their Computation by Machine, Part I John McCarthy, MIT, April 1960

12
Simple composition Map’s output is a list of values, which reduce can accept as one of its argument. Feb 21, 2014CS512 | Spring 2014

13
Analogy  Break large problem into small pieces  Code m f to solve one piece  Run map to apply m f on the small pieces and generate nuggets of solutions  Code r f to combine the nuggets  Run reduce to apply r f on the nuggets to output the complete solution Feb 21, 2014CS512 | Spring 2014

14
Example 1TB file split into 100,000 chunks Count number of lines in each chunk Add counts together to output final line count Feb 21, 2014CS512 | Spring 2014

15
A slightly different map-reduce Map Copies a function on a number of machines and applies each copy on different pieces of the input Reduce Combine the map outputs from different machines into a final solution Feb 21, 2014CS512 | Spring 2014

16
Map-reduce reintroduced… Google created the awareness Hadoop made it into a sensation Hadoop is an open-source map-reduce implementation based on Google’s paper. Feb 21, 2014CS512 | Spring 2014 MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat OSDI'04: Sixth Symposium on Operating System Design and Implementation. December, 2004.

22
Copying the job JobTracker  Hadoop service that copies the map and reduce code to available machines.  Feeds the input to the mappers and connects their outputs to reducers. Feb 21, 2014CS512 | Spring 2014

23
Maps in parallel  Maps run in parallel.  Each maps operates on a set of chunks assigned to it by the job tracker.  Maps write to local disk. Feb 21, 2014CS512 | Spring 2014