Tuesday, July 26, 2011

Since I've been working on Zarkov, I've been writing a few
map/reduce jobs. One of the things I noticed about map/reduce, at
least as I've implemented it in Zarkov, is that it's pretty
inefficient if you want t o generate several aggregate views of the
same data (like say an event stream). In order to meet Zarkov's
performance requirements without making my operations teamtoo angry, I decided to "extend" map/reduce, as it's
implemented in Zarkov. I'm not terribly creative, so I called this
command xmapreduce. Here' I'll briefly describe
map/reduce, show the problem with it, and explain the solution
implemented in Zarkov.

Map/Reduce

The map/reduce algorithm is nice because a) it's
extremely scalable and b) it's pretty easy to grasp the basics. The
idea is that you write two functions, map andreduce which will be applied to a large dataset to get
yourself some interesting data. For our purposes here, I'll
illustrate using Zarkov map/reduce.
Let's say you want to see how many timestamped objects in some
collection exist for each date. Here's how you'd write your Zarkov
map/reduce functions:

The Problem with Map/Reduce

The problem arises when you have several output collections that
all depend on a single input collection (or even depend on the samedocuments in the input collection). Map/reduce can get you
there, bit it'll execute the query once for each output collection,
and it needs to distribute the data to the map()
functions once for every output collection. This isn't too
efficient.
Let's extend our example above by tracking the date, month, and
year of each object. In classic map/reduce, we'd need threemap functions:

Now, if we treat these three jobs separately, we get a lot of
duplicated effort. Particularly, Zarkov must query the input
collection three times and transfer the data to map workers three
times. (There's also duplicate serialization/deserialization of the
input objects, though the Python bson library is quite
fast.)

XMapReduce to the Rescue

The solution Zarkov uses for this case is to allow themap function to return a target collection
along with key and value. What xmapreduce does is
transform the map functions above into one xmap
function, and tweaking our reduce function just a bit
to take the collection as an input:

Now, we can invoke a single job to calculate all three
aggregates, with close to a 3x reduction in data transfer and a
significant speedup. Assuming that a) your jobs can be
combined into a single xmapreduce job and b) you find it acceptable
to code the target collection in your map jobs, xmapreduce should
give you a significant speedup over classic map/reduce. In our
case, we started with 12 map/reduce jobs and ended up with 4
xmapreduce jobs, with a tremendous speedup, but of course your
mileage may vary. Happy hacking!

Very briefly, the shuffle is how you guarantee that all the data for a given key is sent to the correct reduce method. Zarkov previously handled this by sorting (and grouping) the map results by key before sending them out for reduce. This was discovered to be a bottleneck.

This has been enhanced in more recent versions by using a fixed number of 'reduce' workers and having the map perform hashing on the keys to determine which reduce worker the job goes to. During the reduce phase, each reduce worker groups the objects by key, effectively parallelizing the previously serial bottleneck of the sort. Does that help?

Search

Loading...

Useful Resources

Interested in practical MongoDB programming?

MongoDB Applied Design Patternsis available now, both in ebook and dead-tree form. In it, you'll see how to use MongoDB effectively in fields from real-time analytics to content management systems and more. The examples are all in Python, so readers of this blog should have no problem picking it all up.

Want to learn MongoDB using Python?

I just released an 84-page ebook MongoDB with Python and Ming to help you get started. In it, I cover everything from installing MongoDB for the first time, basic pymongo usage, MongoDB aggregation including MapReduce and the new aggregation framework, and GridFS. You'll also learn about Ming, the object-document mapper we built at SourceForge to accelerate our development beyond what we could do with PyMongo.

Pages

Rick's Resources

I'm collecting a list of products and services I've found useful in my work & leisure Python programming at Rick's Resources. If you're interested in that sort of thing, I'd love to have you check it out!