Alan
added a comment - Jun 08 2010 08:30:37 AM +00:00 Thanks for your answer. In the particular (and very simple cf. the source code) example the speed difference is really great! I guess we will stick to the "naive" python implementation for now

Why was this issue closed without the problem being corrected? I'm using mongo (version 2.0) for my dissertation research. A mongodb map/reduce function runs between 10 and 100 times slower than a Java implementation which does the same thing.

To test this I hand-coded a naive java implementation of map/reduce which maps by iterating over a collection, performing the map operation, then storing any emits in a temporary collection. I create an index for the temporary collection, then call reduce which iterates over the temporary collection finding keys, retrieves all entries for that key in batches, stores the result in a new table, and deletes all entries for that key before moving on. When the temporary collection is empty I'm done.

This naive approach took my map/reduce function operating on several hundred million documents from many days down to hours. I've since written an implementation which uses an cache in java and sequential traversal of the temporary collection without deletes which takes it down by another factor of 10.

Why is mongo's implementation so freaken slow? Are you loading an entirely new javascript VM for every application of map? Map/reduce's performance is completely at odds with the excellent performance of everything else.

Stephen Nelson
added a comment - Dec 13 2011 06:46:17 PM +00:00 Why was this issue closed without the problem being corrected? I'm using mongo (version 2.0) for my dissertation research. A mongodb map/reduce function runs between 10 and 100 times slower than a Java implementation which does the same thing.
To test this I hand-coded a naive java implementation of map/reduce which maps by iterating over a collection, performing the map operation, then storing any emits in a temporary collection. I create an index for the temporary collection, then call reduce which iterates over the temporary collection finding keys, retrieves all entries for that key in batches, stores the result in a new table, and deletes all entries for that key before moving on. When the temporary collection is empty I'm done.
This naive approach took my map/reduce function operating on several hundred million documents from many days down to hours. I've since written an implementation which uses an cache in java and sequential traversal of the temporary collection without deletes which takes it down by another factor of 10.
Why is mongo's implementation so freaken slow? Are you loading an entirely new javascript VM for every application of map? Map/reduce's performance is completely at odds with the excellent performance of everything else.

Eliot Horowitz
added a comment - Dec 14 2011 06:16:24 AM +00:00 Javascript is much slower than java.
If it comes down to that - java will always win.
The new aggregation framework is the long term solution. SERVER-447

Nice try, but javascript (v8) is only 3x slower for the type of operations I'm performing. Mongo specifically is adding truly massive overhead to map/reduce operations.

The new aggregation framework is not adequate for the things I'm doing (and I'm sure many other users) - my map functions need to make decisions about documents based on non-trivial dependent properties. The aggregation framework will not be able to replace m/r; it's not sufficient, as you seem to be aware from your comments on the linked issue.

Has anyone profiled mongo's map/reduce implementation to determine where the overhead is coming from?

Stephen Nelson
added a comment - Dec 14 2011 05:11:05 PM +00:00 http://shootout.alioth.debian.org/u32/javascript.php
Nice try, but javascript (v8) is only 3x slower for the type of operations I'm performing. Mongo specifically is adding truly massive overhead to map/reduce operations.
The new aggregation framework is not adequate for the things I'm doing (and I'm sure many other users) - my map functions need to make decisions about documents based on non-trivial dependent properties. The aggregation framework will not be able to replace m/r; it's not sufficient, as you seem to be aware from your comments on the linked issue.
Has anyone profiled mongo's map/reduce implementation to determine where the overhead is coming from?