Contents

1 Introduction

MapReduce is a general technique for massively parallel programming developed by Google. It takes its inspiration from ideas in functional programming, but has moved away from that paradigm to a more imperative approach.
I have noticed that MapReduce can be expressed naturally, using functional programming techniques, as a form of monad.

The standard implementation of MapReduce is the JAVA-based HADOOP framework, which is very complex and somewhat temperamental. Moreover, it is necessary to write HADOOP-specific code into mappers and reducers. My prototype library takes about 100 lines of code and can wrap generic mapper / reducer functions.

2 Why a monad?

What the monadic implementation lets us do is the following:

Map and reduce look the same.

You can write a simple wrapper function that takes a mapper / reducer and wraps it in the monad, so authors of mappers / reducers do not need to know anything about the MapReduce framework: they can concentrate on their algorithms.

All of the guts of MapReduce are hidden in the monad's

bind

function

The implementation is naturally parallel

Making a MapReduce program is trivial:

...>>= wrapMR mapper >>= wrapMR reducer >>=...

3 Details

Full details of the implementation and sample code can be found here. I'll just give highlights here.

3.1 Generalised mappers / reducers

One can generalise MapReduce a bit, so that each stage (map, reduce, etc) becomes a function of signature

a ->([(s,a)]->[(s',b)])

where

s

and

s'

are data types and

a

and

b

are key values.

3.2 Generalised Monad

Now, this is suggestive of a monad, but we can't use a monad per se, because the transformation changes the key and value types, and we want to be able to access them separately. Therefore we do the following.