Feb 28, 2010

TokyoTyrant vs MongoDB vs CouchDB, simple benchmarks

Jeffery Zhao published a simple benchmark of 2 'NoSQL' databases recently. In that article only basic CRU operations are compared. On macbook unibody+osx, which is the platform Jeff use, MongoDB got slightly better scores than TokyoTyrant on almost every aspect.

CouchDB is really slow compared to TT or MongoDB, so I just give up it after serveral round.

The only difference between Jeff's and mine platform seems operating system: he use OSX while I use linux. I'm not sure whether this is the reason we get different results, or because TT is well optimized by gcc on linux?

update: After changed from Net::HTTP to Curb, couchdb benchmarks improved about 1/3. Config couchdb [uuids] algorithm to sequential (in default.ini) has no effect on result. All 3 drivers connect to database through network, but only couchdb use http protocol, this is a bottleneck, or, trade off.

Great articles. And yes, as you pointed out, this is not a real test of what CouchDB is capable. In the benchmarks I test CRU (without Delete) operations in an extremly simple env/context, e.g. no concurrency, no optimizations, etc.

However that's my intention. I'm not experienced enough to get comprehensive benchmarks, I just want to check some basic things for now. The test is certainly not completed yet, I'll add more benchmarks when I have time.

And it's surprising to see the big discrepancy between machines even the tests are so simple.

The distance between CouchDB and other candidates is so big I suspect it's my fault. Did I do anything wrong on couchdb? Should I use pre-genereated sequential ids instead of random UUID? (but I think default should be good enough for all software ..)

I can find official ruby drivers from MongoDB/TT site, but CouchDB only provide a simple ruby module to interact with couch using restful api. Is this the reason? I failed to find a ruby driver for couch, though there's many ORMs.

Yeah, the ruby HTTP stack is sub-par, that is one reason :) Sequential ids is another thing that gives you better performance (CouchDB 0.11 will use them by default).

Another is that CouchDB is optimized for multi-reader/writer concurrency. The baseline might be a lot slower, but it won’t degrade as you crank up concurrency. It'll still be "fast enough" to not be a bottleneck in your application. I hope you only optimise these :)

CouchDB doesn't have "native" bindings* and only the HTTP API because that works everywhere and gets you a ton of benefits: You can add proxies and caches and all the other nifty things you already know from your web server stack.

So it's about trade-offs. You'll find that single query execution speed is rarely where your app needs optimising. But that is not generally true, of course.

CheersJan--

(* there's a pure-erlang API that you can use from an Erlang program that is semi-supported)

CouchDB is optimized for concurrent performance. If you do the writes concurrently (like a webapp normally would under load) performance actually increases because the fsync's are batched.

I can't actually tell what these numbers mean or what this test does because the original article is in japanese but I would warn against testing MongoDB write performance without including a read for each insert. By default MongoDB doesn't return a response for insert calls so all you're really testing is the socket.write() time. You don't know if what you've inserted is even available yet ( it mostly likely is available quickly if you only have one writer but under concurrent load you could have problems) and it won't be written to disc for possibly another minute.