Caching CouchDB requests with Nginx

Caching frequently-performed searches can make your app run much faster with very little change in your code

Apache CouchDB was born on the web. Its HTTP/HTTPS API is not a bolt-on afterthought — it is the way of interacting with the database built in from the ground up. Let’s take the use-case of CouchDB being used as a back-end database in a traditional client/server web app:

Web-app architecture schematic

Web users interact with a web page, sending HTTP requests to one of a number of application servers. The application, needing data to render the page will make an HTTP request to CouchDB to get fetch the data and then respond back in kind to the client.

If the same request is being made to CouchDB over and over again in a short time frame, then the database simply answers each request. Under production loads and to avoid overworking the database, developers may choose to cache data in their app rather than make a round-trip to the database. This is suitable for:

data that doesn’t change very often e.g. a database of US zip codes

slices of data that are accessed frequently but where it doesn’t matter when the user sees a slightly stale version of the query. This is very application-dependent but let’s imagine your e-commerce site is to have a list of three special offers on the front page. As the front page is accessed frequently, it makes little sense to query the database for every page render.

There are many ways to implement a cache. In this article I’ll show how a Nginx proxy can be created to cache HTTP requests to take some of the load off your CouchDB service and to get data to your app quickly.

What is Nginx?

Nginx is an open-source web server. At its simplest it can serve out a tree of static files over HTTP. It can also be configured as a “reverse proxy”, that is it can sit between a client and server and transparently route traffic between them, caching some of the content to allow a future repeat request to be serviced from the local cached data.

Web-app architecture with Nginx as a reverse proxy.

In our application we’ll be configuring Nginx as a reverse proxy and placing it between our application servers and CouchDB. Instead of our application connecting directly to CouchDB, it will instead connect to Nginx which will either return some cached content or make the CouchDB request and return that.

Nginx can be installed in two places:

on the same machine as your application code (your app will connect to port on “localhost”.

or, on a separate machine your network and shared between multiple instances of your application server.

The former approach is simpler, but the second allows multiple application servers so share the same cache pool.

Putting cache to work in your app

Using the Nginx-powered cache in your own app is as simple as feeding a different URL to the Node.js library:

sample code that directs reads to Nginx and writes to the database directly.

The above code makes two objects: one to handle read-only requests via the Nginx proxy, the other for writes that connects directly to the database. The root path of this app performs a query via the proxy, outputting the result.

Running this app has the same performance profile as the curl tests: cached data is retrieved much faster than running a query on a database cluster on the other side of the world.

When to use caching

Employing caching is a trade-off between speed of returning the results against the freshness of data returned. If you know your data isn’t changing frequently, then a generous cache window (say an hour or a day) may be used. If it’s important that fresh data is surfaced to your users quickly, then a shorter window (say 5 or 10 minutes) may be better.

Caching works well when handling “peaky” traffic: let’s say a particular page on your site becomes popular because of the success of a marketing campaign. It’s better in this case to cache the pertinent content and deliver the results quickly, rather than wasting your database resources producing the same results over and over again.

Caching can help take the load from your expensive primary data store by bring cheaper and faster resources to bear instead. Oh and cached data is returned faster.

The nginx configuration caches all GET & HEAD requests by default. I added POST to the proxy_cache_methods configuration to catch query API calls which use the POST /db/_find method. This may have unintended consequences if you route writes through this proxy e.g. POST /db/_bulk_docs or POST /db. I would recommend only sending read requests through the proxy and any API calls that modify data should be sent directly to CouchDB.