With a name like Mongo, it has to be good

I just got back from MongoSF, which was awesome. Over 200 Mongo geeks, three tracks, and language-specific workshops all day.

I got to the conference early and took a picture of the venue... an hour later it was packed.

The highlight, for me, was Eliot Horowitz’s talk on sharding. He set up a MongoDB cluster of 25 large EC2 instances and started hammering them.

He pulled up an incredibly snazzy sharding GUI (okay, I wrote it, it’s not actually that snazzy) that displayed a row of stats for each shard. Each shard had about the same level of operations per second, so you could see that Mongo was doing pretty well distributing the data across the shards.

Eliot scrolled down to the bottom of the stats table where it showed the total number of operations per second across all the shards. The cluster was doing 8 million operations per second.

8,000,000 operations per second.

The whole audience burst into applause.

That’s about 320,000 operations per second per box, which is about what would be expected for Mongo on a powerful server (like a large EC2 instance).

Cool things about this:

If your app needs more than 8 million ops/sec, you can just add more machines and Mongo will take care of redistributing the load.

Your app doesn’t need to know if it’s talking to 1, 25, or 7,000 servers. You can focus on writing your app and let Mongo focus on scaling it.

Speaker's dinner the night before

If you missed out on MongoSF, there are a bunch of other Mongo conferences coming up: MongoNY at the end of May and MongoUK and MongoFR in June.

@Noah unfortunately, replica sets aren’t ready yet. So, it was just 25 shards with no replication, which was actually kind of a problem because Amazon thought that we were DDOSing ourselves, so kept shutting down our instances, thus breaking the demo.

About replica sets: I am totally in love with them, so if you’re interested subscribe to my RSS feed because I’ll be doing a blog post on how to use them ~3 seconds after they’re available in the nightly build.

@Noah unfortunately, replica sets aren’t ready yet. So, it was just 25 shards with no replication, which was actually kind of a problem because Amazon thought that we were DDOSing ourselves, so kept shutting down our instances, thus breaking the demo.

About replica sets: I am totally in love with them, so if you’re interested subscribe to my RSS feed because I’ll be doing a blog post on how to use them ~3 seconds after they’re available in the nightly build.

8 million ops/s is 320000 ops/s per instance.
As you wrote this is far above a local Mongo, Redis or OrientDB instance.
And this on EC2 with more network letency??
Sounsd impossible. What’s behind this magic?
Regards
Stefan Edlich

8 million ops/s is 320000 ops/s per instance.
As you wrote this is far above a local Mongo, Redis or OrientDB instance.
And this on EC2 with more network letency??
Sounsd impossible. What’s behind this magic?
Regards
Stefan Edlich

Those numbers sound amazing, but there are a lot of missing details ;-).

1. what was the concurrency level?
2. where there concurrent CRUD operations? (basically I expect things to behave quite differently when sets of READS are (almost completely) separated from sets of WRITES)
3. what was the average data size/op?

You know such benchmarks can be “fabricated” and missing information tends to encourage that kind of thought.

Those numbers sound amazing, but there are a lot of missing details ;-).

1. what was the concurrency level?
2. where there concurrent CRUD operations? (basically I expect things to behave quite differently when sets of READS are (almost completely) separated from sets of WRITES)
3. what was the average data size/op?

You know such benchmarks can be “fabricated” and missing information tends to encourage that kind of thought.

I think people are taking this post a bit too seriously… I just thought it was a really cool demo of what Mongo’s going to be able to do: scale horizontally to practically infinite capacity. It’s certainly not meant to be a benchmark.

I think we’re working on a more interesting application (one has actual functionality other than hammering the database) that we’ll open source and use as a demo in the future. We’ll also do some real benchmarks against sharding before releasing 1.6, obviously.

I think people are taking this post a bit too seriously… I just thought it was a really cool demo of what Mongo’s going to be able to do: scale horizontally to practically infinite capacity. It’s certainly not meant to be a benchmark.

I think we’re working on a more interesting application (one has actual functionality other than hammering the database) that we’ll open source and use as a demo in the future. We’ll also do some real benchmarks against sharding before releasing 1.6, obviously.

Do you have any more details on the “incredibly snazzy sharding GUI” that you mentioned in the post? I'm interested in doing some load testing to see how MongoDB will perform for my system and it sounds like that GUI was giving some pretty valuable data — anyway to get access to that program? Thanks!

Unfortunately, it's not publicly available yet, but we'll probably release it when 1.6 comes out. It doesn't give any information you can't get from the shell (it basically runs the serverStatus command once per second for each machine), it just presents it in a pretty way. I'm hoping to add a chunk viewer and stuff, too, but it's very, very, very alpha at the moment.

It seems that Eliot’s presentation of the 25 node cluster has been removed from the whole internet. I have doubts about the performance of 320.000 ops/sec/mongod (back in 2010) and would have loved to see his presentation to have some insights into his test setup.

Is there any chance to get the video back or is it gone for ever? Thanks!

I have no idea about the video availability, the 10gen site’s been recreated a few times since this was posted. For advice on speeding thing up, I recommend starting with http://docs.mongodb.org/manual/administration/optimization/ and MongoDB conferences often have talks about performance tuning.

(I know this is a fairly unsatisfying answer, but I was busy being sick with nerves for my talk and wasn’t there when they were setting up the cluster.)