I'm hosting CouchDB databases on Amazon EC2, and I was wondering if I can have better I/O performance (and better disk utilization, of course) if I use a filesystem that supports compression. I remember reading somewhere that the future versions of CouchDB would support data compression and I was wondering if I could get that feature now if I just compress my filesystem. I am looking at using small instances, but large ones aren't out of the question either. I am afraid the compression would kill the CPU on those instances, but I can't tell until I test that.

1 Answer
1

I have not tried this, on the KISS principle, however I think you would see some increased performance.

CouchDB will use Google's Snappy compression algorithm. Filipe Manana introduced the feature in the issue tracker, COUCHDB-1120, and has since committed it to "trunk" (now the "master" branch since the Git migration). It is in the 1.2.x branch so when you see that CouchDB 1.2 has been released, you'll know it's shipped.

In the meantime, yes, CouchDB is basically nothing but b-tree lookups. Even the Javascript "queries" (they're more like index definitions) only run once per document update. Being a database, CouchDB likes good storage, and you'll likely over-provision CPU in order to meet your storage needs. Therefore it seems plausible that you will see either a boost, or at worst no net change by spending more CPU on compression.

If you make benchmarks, I am sure the CouchDB community would love see them! Feel free to send it to the user list or just tweet it mentioning CouchDB. Good luck!