Archive

I added some new views to CouchDb on a development server the other day. The views included three or four emits and the full document in the view, and the disk space used exploded to several times the size of the actual data stored. Everything came to a screeching halt as the server had no space to write logs, no space to build views, no space to store the data we were pushing in, and no space for the compacts that were trying to run. It was one of those moments when you are really happy that the host you just messed up is a development box. This episode raised several lessons for me, some of which were new in the specific context, some of which were good reminders.

First, CouchDb needs lots of disk space. If you are working with views, lots of disk space can disappear very fast, as Couch copies the data files for compacting. This is not a bad thing, its what allows Couch to keep running and performing while it is doing these operations, but it is something to plan for when sizing resources. Running out of disk space is a bad thing. In my case, I had estimated the size of the data and allowed some room for compacts and views, but not nearly enough.

Second, this episode started up quite a debate within my team about designing views. I tend toward making views overly inclusive. My friend tends toward making the views as small as possible, and then throwing in an include_docs if you need more in the results. This is a trade-off, and in the end you have to consider it with every view you write. Small views save lots of disk space, and speed up writes and simple queries where you don’t need much in the results. Using include_docs is fairly cheap if you are getting a handful of docs. If you’re looking for results from hundreds or thousands of records, include_docs loses its appeal, because the server has to fetch each of those documents. Its a balancing act. If you argue for larger views for faster queries as the side on which you should err, I’d suggest making sure you have plenty of disk space. Turns out it is a little embarrassing when you make that argument and then badly under estimate how much disk space you really need, even on a development box.

Third, I got to learn how to clean up from dead compact jobs. When the server is out of disk space when it tries to compact a database, it can leave behind compact files that prevent compacts from working even after the space problem is solved. In my case on Ubuntu, these are in /var/lib/couchdb. If there are 0 size .compact files in your database directory you just need to delete them and restart the compact. I also noticed that there were some non-zero size .compact files in databases that were not actually compacting, and I removed these as well. Everything went back to humming after that.

CouchDb is a great tool. It does have some peculiarities it is good to keep in mind. Life was simple when we had basically no real choices. “Welcome to LAMP! Would you like MySQL or Postres with that?” Now we have all kinds of options on the menu, and we actually have to think about finding the right tool and understanding its strengths, weaknesses, and quirks.