Introduction

Apache CouchDB is an Open Source database management software published by the Apache Software Foundation. It is developed as a community project with several commercial vendors supporting the core development as well as offering support and services.

CouchDB is written in Erlang, a functional programming language with a focus on writing robust, fault tolerant and highly concurrent applications. CouchDB uses HTTP as its main programming interface and JSON for data storage. It has a sophisticated replication feature, that powers most of the more interesting use-cases for CouchDB.

Released first in 2005, CouchDB has been in development for nearly a decade and is used all over the world from independent enthusiasts to industry giants. It has healthy developer and support communities and a steadily growing fan-base. It's main website can be found here.

What sets CouchDB apart from other databases?

Traditionally, databases are the single source of truth for an application or a set of applications. They manage data storage and access, data integrity, schemata, permissions and so on. Usually, the boundary for a database system is a single server, a single piece of hardware with global access to all storage, memory and CPU in a coherent fashion. Databases systems like these include MySQL or PostgreSQL.

With the advent of the web in the late 90s and it’s mass-success in the 2000s, the requirements for backends of websites have changed dramatically. And with them changed the ways people were using these databases. The main requirements can be placed along two axes: Reliability and Capacity.

A common MySQL setup in 2002 consisted of a single database server. When a particular application became popular, or the source of business, reliability becomes a must. To address this, people started setting up MySQL replication from the single database, now called the “primary” to a “hot spare secondary” database. Should the primary database server crash or become otherwise unavailable, the hot spare secondary database server could be promoted to be the new primary. This can happen quickly and often with little or no interruption to the application.

On the other axis, websites and -apps of the early and mid-2000s had a particular access pattern: 85%-99% of all requests were read requests, e.g. requests that only retrieved existing information from the application, and thus the database. Only a very few percentage of requests would actually create any new data in the database.

At the same time, the amount of requests for a website could easily outnumber the resources of a single hardware server. This lead to a plethora of solutions to address the problem of using more hardware to serve the load. As a first tier, caching with things like memcached was (and still is) often deployed. And then, as this wouldn’t suffice to keep up with the number of requests, people turned to MySQL’s replication feature again, for creating what is called “read-only secondary” databases. These are databases that would continuously read all changes going into the primary database server. The application then could direct all write requests to the primary server and balance read requests to one of the many read- only secondary servers, thus effectively distributing the load.

With this setup, some new problems come up, now that CPU, ram and i/o no longer live in a single machine and a network is involved. With replication lag, it could take some time for a write request to arrive at a secondary and if an application’s user was unlucky, it might appear to them that the write request has not succeeded. But overall, this was fairly reliable and good monitoring could help keep this and other issues at bay.

In the late 2000s the web turned from a read- mostly medium to a full read-write web and the above solutions started to show their limits. Companies like Google and Amazon started building custom in-house database systems that were designed to address these and more upcoming issues head-on. Both companies published notable papers about their work, and they are the foundation for many database systems that are today known as “NoSQL“ or “BigData” Databases”, including, but not limited to: Riak, Cassandra and HBase.

The Outlier: CouchDB

At the same time, but, at first, without being influenced by the above developments, CouchDB was started as an Open Source and from-scratch implementation of the database behind Lotus Notes (of all things!). In 2007 CouchDB added an indexing and querying system based on the MapReduce paper, but other than that, the design of CouchDB is mainly influenced by Lotus Notes. Before you stop reading in anger: Nobody likes to use Lotus Notes the application, but its underlying database has some remarkable features that CouchDB inventor Damien Katz thought would be worth preserving.

If CouchDB shares little history with other databases of the same era or the era before, why the long introduction? It is easier to explain CouchDB’s features in the context of the larger database ecosystem.

What makes CouchDB special?

CouchDB is more like git than MySQL.

If you remember SVN, CVS, or similar centralised version control systems: for every interaction you had to talk to the server and wait for the result. svn log, send the request to the server, wait for the server to process the request, wait for the result to come back, display the result. Every. Time.

In git, *snaps fingers*, interactions are instant. Why is that so? All operations are local. Your copy of the repository includes all the information the remote server has and therefore, there is no need to run every interaction over the network to some server. In this view of the world, CouchDB works like git, you can have a copy of all your data on your local machine, as well as a remote server, or multiple servers, and a continuous integration setup, and spread across all over the globe, if you want. All powered by CouchDB’s replication feature (while it has the same name as the feature in, say, MySQL, CouchDB’s replication is significantly different, we’ll soon see why).

With this in mind, let’s revisit the various scenarios that other databases were first augmented, and now are struggling with:

The single database server: like a classic MySQL setup, CouchDB supports this just fine. It is the default for many users.

The primary-secondary setup with a hot- failover or for reliability (or both): CouchDB’s replication let’s us set up a hot failover easily. A standard HTTP proxy in front can manage the failover scenario.

The primary-many-secondaries scenario for scaling out read requests: as trivial as before, nothing to see here.

So we’ve got all that covered, but with CouchDB, we don’t have to stop here. Say you have an office in London and a Customer Relationship Management (CRM) software that uses CouchDB. Everybody in the London office can access the application and its data with local LAN speeds. All is well.

Now you open an office in Tokyo and the people there need access to the same CRM data. But making requests around the globe adds a significant latency to each and every request (remember the SVN scenario above?) and your colleagues in Tokyo are quickly frustrated. In addition, if your London office network connection, or any network in between has any issues, Tokyo was effectively cut off from the data they need to do their work with their customers.

Luckily, you know that CouchDB has replication built in, and you start setting up a database and application and database server in the Tokyo office that people there can have access to using the local LAN. In the back, both CouchDB instances can replicate changes to each other, so that data added in Tokyo eventually makes its way to London and and vice versa.

All employees are productive and happy and the extent of your software configuration work can be summed up in these two simple curl commands:

(line breaks are not required)
curl -X POST https://tokyo.office/_replicate -d
'{"source":"https://london.office/crm", "target":"crm","continuous":true}'
// replicate all data in the “crm” database from London to Tokyo // continuously
curl -X POST https://london.office/_replicate -d
'{"source":"https://tokyo.office/crm", "target":"crm","continuous":true}'
// replicate all data in the “crm” database from Tokyo to London continuously

When you open your New York office, you already know what to do and you can make sure, you don’t let the people there have the same painful experience that their colleagues in Tokyo had to go through.

And you don’t have to trust your data to one of the cloud-providers that may or may not take your data’s confidentiality seriously.

Before we continue with interesting use-cases like this, let’s look at one of the major steps in CouchDB’s development history.

BigCouch

BigCouch started as a fork of Apache CouchDB by the company Cloudant, who operate a big- data-as-a-service platform based on CouchDB. Their platform includes more things, but at its core sits BigCouch.

After running the platform in production for a while, Cloudant decided to release its core, BigCouch, as an Open Source project. Fast forward a few years and BigCouch is now actually being added into the main Apache CouchDB codebase. The upcoming CouchDB 2.0 release will include the full BigCouch feature set and make CouchDB a fully clusterable database solution.

Why is this significant? Easy: The “C” in “CouchDB” stands for “Cluster”, the full name is “Cluster Of Unreliable Commodity Hardware” and until BigCouch, CouchDB did not have any cluster management features built in. CouchDB’s features, however, were carefully designed that should someone use CouchDB in a cluster (either behind a simple HTTP proxy, or a sophisticated system like BigCouch), that its semantics would stay the same.

The promise CouchDB made was that should you start on a single machine setup and at some point outgrow the capacity of that machine, you could move to a cluster without having to rewrite your application. A big promise that can now be fulfilled.

BigCouch is an implementation of the aforementioned Dynamo paper by Amazon. Dynamo defines a cluster and data management system that allows any number of machines to behave as if it was one while handling magnitudes more data and requests than a single server could handle, and on top of that be very resilient against server faults. In other words: BigCouch addresses both axes of the reliability and and capacity spectrum.

BigCouch achieves that by splitting up each databases into a number of shards. Each shard can live on a separate hardware server. Each server can return data from anywhere in the database: if it is local to the server, it just returns it, if it lives on another server, it fetches it from there and returns it to the client as a proxy.

In addition to sharding databases, BigCouch also keeps replicas of each shard on other server nodes in the cluster. In case a shard becomes unavailable through a network, software or hardware failure, the replica shards will be able to continue to serve requests in the face of one or more missing shards.

Now that we’ve learned that CouchDB is more like git than other databases, and that it is designed to scale in a cluster with BigCouch, it is time for another ludicrous statement.

CouchDB is not just a database; it is a protocol

The promise of CouchDB, being able to store data close to where it is needed is so attractive that people started porting CouchDB over to other languages and environments, to make sure they can benefit from it’s sophisticated replication feature.

The most notable implementations of The Couch Replication Protocol are PouchDB, Couchbase Lite (née TouchDB), and Cloudant Sync for Mobile. PouchDB is implemented in JavaScript and is designed to run in a modern web browser (including mobile browsers). Couchbase Lite and Cloudant Sync come in two flavours: one for iOS written in Objective-C and one for Android written in Java and both are meant to be embedded in native mobile applications. They are all Open Source projects separate from Apache CouchDB, but they share the same replication capabilities, although some implementation details that we explain for Apache CouchDB below differ in the various other projects.

Why would you want to have a database in your browser or on your phone? Well, you already do, but none of the existing databases in these places have a powerful replication feature built in and that can have a number of significant advantages:

With PouchDB, you can have a copy of your active user-data in your browser and you can treat it as a normal database. That means that all operations from your web application only ever talk to your in-browser PouchDB database, communication with the server happens asynchronously.
The benefits here are manyfold: because all end-user interaction only ever hits the local database, the user never has to wait for any action to take place, things happen immediately as far as they are concerned. This creates a significantly better user experience than having to wait for the network for every interaction. Both Amazon and Google have published studies that show that even 100ms worth of extra wait time turns people away from engaging with a website.
Another benefit is that, if you are on a spotty wifi connection, or even on a phone in subway, you can just keep using the app as if you were online. Any changes you make or are made on the server side are synchronised, when you are online again.

With the iOS and Android implementations you get the same benefits, but for native mobile apps. Imagine having something that works like IMAP for email clients, but for any app: full access to all data locally, working synchronisation back and forth between multiple clients and full offline functionality.
In the mobile use-case the latency of the network is even higher than on WiFi, the connectivity is less consistent, even with the latest mobile broadband technologies. Waiting a second or two for every user interaction is frustrating at best.
In addition, the radios on battery powered devices are a huge drain on the power when they are active. That’s why mobile operating systems do all sorts of tricks to avoid having to power up the radio, and when the radio is running, make the most of it, so it doesn’t have to be powered up again any time soon.
With fully offline applications the radio does not have to be powered up a lot, let alone for every user interaction, resulting in significantly better battery life and thus user experience, happier users and happier operators, who now get more out of their infrastructure.

In short, having a database that implements The Couch Replication Protocol gives you the following advantages:

Improved user experience through zero latency access to user data

Network-independent app usage, apps work offline

Massive savings in battery power

All of the above can be summarised as “Offline First”, which is an initiative to promote the techniques and technologies with the above benefits.

The vision

All this is already very compelling, but things can go even further. Imagine a Couch replication capable database on every Linksys router, built into every phone and web browser: People would have better access to their data and more control of their data in a world that is increasingly centralising resources and power around a few major corporations.

Now that understand why CouchDB and other implementations of The Couch Replication Protocol have a number of compelling features for a modern computing world, let’s have a look at things work in detail.

Technical details

Fundamental to CouchDB are HTTP as the main access protocol and JSON as the data storage format. Let’s start with these.

HTTP

HTTP is most widely deployed end-user visible protocol in existence. It is easy to understand, powerful, supported everywhere in programming environments and comes with its own fleet of custom hard- and software that handles everything from serving, routing, proxying, monitoring, measuring and debugging it. Little other software is as ubiquitous as HTTP.

The main way to do anything with CouchDB is via HTTP. Create a database: make an HTTP request; create some data: make an HTTP request; query your data: make an HTTP request; set up replication: make an HTTP request; configure the database: make an HTTP request. You get the idea.

Just to give you a hint of what this looks like:

curl -X PUT
http://127.0.0.1:5984/my_database

This creates a database from your command line, simple as that.

A database is the bucket, the collection for data. Where relational databases store tables and rows in a database, CouchDB stores documents in a database. A document contains both the structure and the data for a specific data item. That is why CouchDB is often classified as a document oriented database. An easy way to think of a document from a programmer’s perspective is that of an object; or the serialisation of an object.

JSON Documents

CouchDB documents are encoded in JSON. JSON has this nice property that it doesn’t try to be all things to all people. It is not a superset, not all data structures all programming environments support can be adequately represented in JSON. What makes JSON so nice is that it is a subset of all the native types that are shared among all programming environments: numbers, booleans, strings, lists and hashes (and a few odds and ends).

This makes JSON a great format for data interchange between different systems because all you need is a JSON parser, that translates JSON into the native types of a programming language and that’s considerably simpler than translation layers that would map all sorts of sophisticated things between different environments where they really wouldn’t fit in.

In addition, JSON is native to web programming, it is fairly concise and compresses well, so it is a natural choice for and mobile application programming. With CouchDB, you get JSON out of the box.

CouchDB will happily store whatever JSON you will sent it. In that sense, CouchDB is a schemaless database by default. This helps with prototyping applications, as one doesn’t have to spend countless hours defining a data model upfront, you can just start programming and serialise your objects into JSON and store them into CouchDB.

This also cuts out the middle layer known as Object Relational Mappers (ORMs). Superficially speaking, an ORM turns a relational database into something that feels natural to object oriented programmers. With CouchDB, you get that same interface natively, leaving a whole class of problems behind you from the start. In addition, the source code of many popular ORMs is larger than CouchDB’s source code.

CouchDB also supports binary data storage. The mechanism is called attachments, and it works like email attachments: arbitrary binary data is stored under a name and with a corresponding content type in the _attachments member of a document. There is no size limit for documents or attachments.

Schema enforcement optional

Being able to store arbitrary data is of course a blessing when starting out to programme, but further down the development cycle of an app, you do want to be able to make sure, that the database only allows writing of documents that have properties that you expect. For that reason, CouchDB supports optional schema enforcement. Sparing you a few details, all you have to do is provide CouchDB with a small JavaScript function that decides whether a document conforms to the expected standard or not:

This function is run every time you attempt to write a new document or update an existing document to the database.

You can also load supporting libraries that do more declarative schema enforcement using JSON Schema, for example.

Changes, or “What happened since?”

Imagine a groupware application that has a dashboard that includes a quick overview what is currently happening. The dashboard would need to have information about what is currently going on, and it should update in real- time with more information arriving.

One of the more exciting features of CouchDB is what is called a Changes Feed. Think of it as git log, but for your database. The CouchDB changes feed is a list of all documents in the a database sorted by the last recent change. It is stored in a fast index structure and can efficiently answer the question “What happened since?” for any range of the history of the database. Be it from the beginning, or only the last 1000 changes that were made to the database.

The changes feed is available over HTTP in a few different modi, and it enables a few very interesting use cases:

Continuous mode: our dashboard can open a connection to the changes feed and CouchDB will just leave the connection open until a change occurs in the database.
Then it sends a single line of JSON with information about the document that was just written to the dashboard. The dashboard then can update its internal data structures, and end- user views to represent this new data.
Other examples are email: the changes feed could be used to implement email push for example. For a mobile messaging app, it could be push notifications.

Continuous since : CouchDB understands itself as the main hub for your data. But it is a hub that is very easy to integrate with other pieces of software that would want to have access to that data.
On top of the information about the new documents or document changes or deletions from the database, the changes feed also includes a sequence number that is a bit like an auto increment integer that gets updated every time a change to the database occurs.
The changes feed is indexed by this sequence id, so asking CouchDB to send you the documents since the time you talked to it is a very efficient operation. All you have to do is remember the last sequence id you received from CouchDB and use that as the since parameter for your next request. That way you can maintain a copy of the data in your database, for example for a back, or a full text search system, or whatever else you can envision (and people come up with the more remarkable things here) that allow for efficient delta updates, where you only need to request the amount of data that changed since the last time you talked to CouchDB.
In addition, this architecture is very resilient, because in case the receiving end terminates or crashes for whatever reason, it can just pick up where it left, when it starts up again. Sequence IDs are different from document IDs or Revision IDs in that it is maintained per- database, and is increased every time a change is made to a database.

The document state machine : another common pattern is to use a document to track the state of a multi step operation, say sending an email. The steps could be: 1. end-user initiates the email-send procedure; 2. backend received the user’s intent and email details; 3. sub-system responsible for sending email reserves email for sending (so other parallel sub-systems don’t send the email twice); 4. sub-system responsible for sending the email attempts SMTP delivery; 5. sub-system records state (success or failure); 6. frontend picks up email send state and updates user interface accordingly. All these discrete steps (and maybe there are more) can use a single document to ensure consistency of the operation (guaranteed sending, but no sending twice) and the changes feed can be used to loosely couple all the sub-systems required to perform the whole procedure. (Yes, this is a poor-person’s message queue, but for persistent queues, this is not a bad one).
These just as a few examples for the various things that the changes feed enables.

Replication

We have already established that data synchronisation is what sets CouchDB apart from other databases. Let’s now take some time to dive into how it all works.

What you see here is that the document in question was indeed received and committed to disk ( "ok": true ), we get the id back that we chose in the URL ( "id":"my_document" , if this doesn’t seem immediately useful to you, you can also POST a document to CouchDB and have its id auto-generated, then you’d see which one you got), and finally we get the revision of the document:

"rev":"1-2e7eef663cf24b39ac342a6627ecb879"

A revision signifies a specific version of a document. In our case it is the first version ( 1- ) and a hash over the contents of that document. With every change to the document, we get a new revision. In order to make an update to our document, we must provide the existing revision to prove that we know what we are changing. So technically, the revision is an MVCC token that ensures that no client can accidentally overwrite any data that they didn’t mean to.

Revisions are also what enable CouchDB replication. For example, during replication, if the target already has a document with the same id, CouchDB can compare revisions to see whether they are the same or whether they differ, or whether one revision is an ancestor of the other. But let’s start at the beginning.

Fundamentally, replication is an operation that involves a source database and a target database . The default mode for the operation is to take all documents that are in the source database and replicate them to the target database. Once all documents from the source are on the target, replication stops; it is a unidirectional, one-off operation.

There are various other modes. If source and target have replicated before, replication is smart enough to only replicate the documents that were added, changed or deleted on the client since the last replication to the target (c.f. the changes feed).

Replication can also be continuous, then it behaves like the regular replication, but instead of stopping when it is done at the end of replicating all documents from the source, it keeps listening to the source database’s changes feed (that we learned about above) for further documents and it replicates them as they appear in the feed.

There are various more options, but the most important one here is filtered replication. It allows you to specify, again, a JavaScript function that gets to decide, whether a document should be replicated or not. In fact, this function operates on the changes feed, so you can use it outside of replication as well. An example function, that you would pass to CouchDB, looks like this:

This forbids documents that have a type of 'horse' . A type, by the way is nothing CouchDB would enforce out of the box, but it is a useful convention that many people have used. See validation functions above, for how to enforce specific document formats.

In a multiple-primary-databases situation, it would be nice if replication would work backwards as well. And it does, we can simply start a second replication where we switch the source and the target and CouchDB will do the right thing. CouchDB is smart enough to not keep replicating documents in a circle with this setup.

Automatic conflict management

The avid reader is now sitting by the edge of their seat and biting their nails, burstling with the one big question that comes up, when talking about systems with multiple primary databases: “What about conflicts?”, what a relief, we finally got to ask this question!

CouchDB prides itself with automatic conflict detection support. It is powered by one more data structure that CouchDB maintains that we haven’t explored yet: The revision tree for each document. Instead of only storing the latest version of a document, we also store a list of revisions (not the data, just the N-Hash information) in the order of their occurrence.

When we now replicate a source and a target and there is a document on the target that has the same id as one of the documents in the source database, CouchDB can simply compare the revision trees to figure out whether the version on the target is the same or an ancestor of the one on the source and whether it can do a simple update or has to create a conflict.

If CouchDB detects that a document is in a conflict state, it adds a new field into its JSON structure:

_conflicts: ['1-abc...', '1-def...']

In practice, CouchDB will keep both versions of the document and lets us know which of the revisions are now in conflict. With that information, a client can go in try to resolve the document conflict my picking one of the two revisions and deleting the other, or by merging the two into a new third resolved version. It works very similar to conflicts in version control systems, where you get the

>>>>>>>>HEAD

and

<<<<<<<<VERSION

markers that you have to resolve before continuing your work.

In contrast to version control systems, where conflicts are marked up so that no compiler would accidentally allow them, CouchDB will arbitrarily and deterministically pick a winning revision, that will be returned by default, should a client ask for it. The determinism adds the nice property that after a conflict replicates through a whole cluster, all nodes will respond with the same default revision, ensuring data consistency even in the face of conflicts.

The conflict state is a natural state in CouchDB that is not more or less scary than any other, but it has to be dealt with, as keeping lots of conflicts around will make CouchDB less efficient over time. Client software is expected to add a basic conflict resolution mechanism. In theory, CouchDB could provide a few default cases here, but since it depends on the application how conflicts should be resolved, the developers have shied away from this so far. The worst case scenario is that applications and users lose data that they previously assumed to be safe and that is not something that fits with the philosophy of CouchDB.

We have mentioned the Changes Feed and filter functions already. This allows you to create a real-time feed of all the conflicts that are occurring and you can have your application logic that deals with conflicts subscribe to that and handle conflicts as the come in.

Queries

A database that is just a key-value store is relatively straightforward to build, but it is limited its applicability. To be useful to a wide variety of applications, it should support some mechanism for querying. Relational databases are entirely based around the SQL query model. In CouchDB, the querying system, called Views, sits on top of the core database and makes use of the MapReduce paradigm and JavaScript functions to create indexes that provide access to the data in ways that are more useful to applications than just the core data store. Using MapReduce here allows the querying model to be clustered with the core data. More on that later.

Querying CouchDB is a two phase operation. First we create an index definition and second, we make requests against that index. As before, CouchDB allows you to write little JavaScript functions to do the work.

Let’s say we want a list of all documents in our database sorted by last name, the equivalent of a SELECT lastname FROM people ORDER BY lastname in SQL.

View definitions are specified on special documents, that CouchDB calls design documents. The only thing that makes them different from regular documents is that their ID starts with _design/.

To create an index that is sorted by last name, we need to write this JavaScript function:

function(doc) {
emit(doc.name.last);
}

This function is called a map function as it is run during the “Map”-part of MapReduce and we need to store it inside a design document under the name of people/by_last_name. See the CouchDB documentation for the exact details.

We can now specify a magnitude of query options to limit our result set to what we exactly need. The official documentation on views explains it all in painstaking detail.

The "offset" is useful when paginating over a view result, but we won’t cover this here, as the CouchDB documentation does a good job of explaining it.

Under the hood

What happens under the hood when you make that first HTTP request to a view after defining it in a design document is as follows:

CouchDB’s view engine notices, that there is no index to answers this view yet, so it knows that it needs to create one. To do that, it opens the changes feed of the database that the view definition is stored in and reads the results from the changes feed one by one. For each result, it fetches the corresponding document from the database itself and applies it to the map function that we have defined. For every time the emit(key, value) function is called, the view engine will create a key-value pair in the view’s index. The index is stored in a file on disk that is separate from the main database file.

When it is done building the index, it opens the index file and reads it from top to bottom and returns the result as we see above.

Now for every subsequent request to the view, the view engine can just open the index and read the result back to the client. Before it does, though, it checks, using the database’s changes feed, whether there are any new updates or deletion in the database since the last time the index was queried, and if that’s the case, the view engine will incorporate these new changes into the index before returning the new index result to the client.

That means CouchDB indexes are built lazily, when they are queried, instead of when new data is inserted into the database, as it is common in other database systems. The benefit here is twofold: 1. having many indexes doesn’t slow down the insertion of new updates into the database and 2. bulk updating an index with new changes instead of one-by-one is a lot more space, time and computationally efficient.

This also means that the very first query to a view that has been created on a database that already has millions of documents in it, can take a while to complete. This is the functional equivalent of a

ALTER TABLE CREATE INDEX

call in SQL.

Reduce

The above example only shows the “Map”-part from MapReduce. The “Reduce”-part allows you to do calculations on top of the result of the Map-part. Reduce is purely optional, if the Map- part does what you need, there is no need to bother with a Reduce.

An easy example is the _count reduce, that simply counts all the rows of your view result, or your sub-selection of it. But more sophisticated things are possible.

With a group_level of 3 we get a group by the day of the expense. All results above are created from the same index and they can all are part of the index structure with only a minimal part required to be calculated on query time. Large collection time-indexed data can be queried very efficiently from a single index and grouped in many ways.

MapReduce and clustered queries

So far we only have looked at the case of a single CouchDB instance running on a single server and you might be wondering why we are going through the hassle of learning MapReduce and how CouchDB uses it just to query some data.

Remember back to the introduction of BigCouch when we learned that while before BigCouch/CouchDB 2.0 the whole system was a single-server system, but all its features were designed carefully to retain semantics in case CouchDB was going to be used in a clustered system? The reason CouchDB use MapReduce queries is exactly that: It allows the execution of semantically equivalent queries on a single server and a cluster of servers of any size. So whether one computer, 10 computers or 100 computers are used to produce a query result, the result is always the same.

Transactions

Another hot topic with databases is transactions. CouchDB does not support transactions in the common sense that you start one, do a bunch of operations and then end it and only at the end you know if all operations succeeded or not. CouchDB’s scope for operations is a single HTTP request that does a document read, or a document write, or a view query, but there is no way to make these operations inter-dependent in any way. The reason is simple: as soon as replication kicks in, which is after any of the individual requests have been completed, there is no notion of a transaction, so replicating a set of documents in one piece is not something that CouchDB allows, and that’s on purpose: writing such a system is very complex, error prone, slow and not very scalable.

Does that mean CouchDB cannot express what other databases do with transactions? Not quite, for the most common things, CouchDB can emulate transactions on top if its document storage and view semantics. For certain scenarios, a little more work on the client is required to get the same behaviour, but this exceeds the scope of this document.

In 2007, Pat Helland, then Platform Architect at Microsoft, and formerly Amazon wrote a seminal blog post titled “Accountants don’t use erasers”. The basic premise is that in traditional computer science education, transactions are the cornerstone for any financial transactions. While actual, real-world financial applications couldn’t even function legally if they were using transactions.

In accounting, everything is written to a log. Want to transfer some money from A to B, it goes in the log. Your funds are sufficient and the receiving end exists, money makes the move and a record goes in the log. Should your funds not suffice to complete the transaction, that new information that goes into the log. Instead of erasing the log entry that started the transaction, we add another one that records its failure state. If that wasn’t the case, auditing banks and other money trails would be near impossible.

We can use that image to make transactions work in CouchDB. Instead of keeping a single document that has the balance of a bank account, we simply record all the transactions that make up the balance. Consider these four documents:

Now, the way that views work ensure that the result is always a consistent view of the balance. The index for a view is outside of the main database file. In order to produce results that are consistent with the database, views use this procedure:

a request to a view is made

the view engine looks up the current index
and reads its sequence id.

the view engine asks the database engine to
send all changes that happened since the
sequence id that was recorded with the view
index.

the view engine incorporates all document
additions, changes, and deletes into the view
index and records the last sequence id.

the view engine returns the result of the view
index request to the caller.

In addition, single document write-, update-, or delete-operations in CouchDB have ACID semantics. This isn’t to claim that CouchDB is an ACID database, but the core storage operations adhere to the same principles, and views can give you a consistent view of your data, so as far as your application is concerned, CouchDB behaves like any other database you would expect.

That way, CouchDB can guarantee that the view result is consistent with the database at the time the request was made. There are some options where you can trade result latency for accuracy, but for our transaction example, we use the default case.

Internals

This section is about CouchDB’s internals. It explains how the various features in CouchDB are implemented and how the combination of them all make for a resilient, fast, and efficient database system.

Core data storage

CouchDB is a database management system. It can manage any number of logical databases. The only limitations are available disk space and file descriptors from the operating system. In fact, it is not uncommon to set up a database architecture with CouchDB where every single user gets their own database. CouchDB can handle the resulting potentially hundreds of thousands or millions of databases just fine.

Each database is backed by a single file in the filesystem. All data that goes into the database goes into that file. Indexes for views use the same underlying storage mechanics, but each view gets its own file in the file system that is separate from the main database file.

Both database files and view index files are operated in an append-only fashion. That means that any new data that comes into the database is appended to the end of the file. Even when documents are deleted, that is information that goes into the end of the file. The result is an extreme resilience against data loss: because once data has been committed to the file and that file has been flushed to the underlying disk hardware, CouchDB will never attempt to fully or partially overwrite that data. That means in any error scenario (software crash, hardware crash, power outage, disk full etc.) CouchDB guarantees that previously committed data is still pristine. The only way problems can creep in is when the underlying disk or the file system corrupt the data, and even then CouchDB uses checksums to detect these errors.

Data safety is one of the big design goals of CouchDB and the above design ensures a maximal degree of resilience.

In addition, operating always at the end of the file allows the underlying storage layer to operate in large and convenient bulks without many seeks and it turns out that is the best case scenario for both spinning disks and modern SSD systems.

The one trade-off that CouchDB makes here in lieu of a more complex storage subsystem that can be found, for example in InnoDB is the need for a process called compaction to clean up extra database space and old document revisions. Compaction walks the changes feed of a database from the beginning and copies all most recent document versions into a new file.

Once done, it atomically swaps the old and the new file and then deletes the old file.

B+-trees

One level up, both databases and view indexes use a B+-tree variant to manage the actual data storage. B+-trees are very wide and shallow. In CouchDB’s case, a tree with a height of 3 can handle over 4 billion documents. There is no upper limit to the number of documents or amount of data stored in a single one of CouchDB’s B+-tree.

The advantage of a wide tree is operational speed. The upper layers of a tree do not hold any actual user data (a function of the +-ness of the B-tree) and always fit in the file-system cache. So for any read or write, even with hundreds of billions of documents, CouchDB only needs a single disk seek to find the data for a document, or a place to write a new document.

Concurrency

CouchDB is implemented in Erlang, a functional programming language and virtual machine that comes with a rich standard library that makes it easy to build large-scale robust applications that supports a high degree of concurrency.

Erlang’s heritage is the telecommunications industry and a core application are telephone routers, until today Erlang powers many major telecom phone and SMS exchanges. It turns out that the problems the telecommunications industry had when it designed Erlang mirror closely the issues of the modern computing landscape: millions and billions of individual users, no possibility for maintenance windows for, e.g. software updates and the requirement for extreme isolation: if a single user has an issue, it should not affect any of the other users that are using the system at the same time.

Erlang is built to solve all the above problems and CouchDB makes full use of all of them.

JavaScript

CouchDB embraces JavaScript as a first class language for in-database scripting tasks. It embeds Mozilla’s SpiderMonkey engine. At the time of writing this, there are a few experiments revolving around Google’s V8 engine as well as Node.js as a platform for embedded scripting needs.

Plugins

CouchDB is extensible with a comprehensive plugins system that allows to augment CouchDB’s core features and semantics in any way a user needs. At this point, plugins will need to be written in Erlang and there are efforts underway to provide a common registry of plugins as well as a single-click installation process for end users.

A plugin, for example, coul be a secondary query engine, like GeoCouch, a two-dimensional indexing and query engine that works much like views, but is optimised for geo-spatial queries.

The Apache Software Foundation

CouchDB is developed, maintained and supported under the umbrella of the Apache Software Foundation (ASF). The ASF is an organisation with a strong focus on building healthy communities for producing Open Source software under the Apache 2.0 License.

That means that CouchDB is available free of charge and can be used in conjunction with any commercial projects. It also means that the development roadmap and management is not tied to any single corporation or person. A community consisting of engineers, designers, documenters, project managers and community managers that either work for companies that use or support CouchDB, or that work indepently work together in the the open on the future of CouchDB. It is guaranteed that anyone can follow the ups and downs of the project o public mailing list. As such, CouchDB is secured against any single vendor dominating the project or a lock-in by a particular party thahas it’s own agenda. At the same time, the ASFprovides a level playing field for commercial enterprises to enhance the larger CouchDB ecosystem on a strong independent core.

Hoodie is an open source web-framework that allows people with only minimal frontend web design skills to build fully featured web applications. It’s core is a friendly API that encapsulates all backend complexities in an easy to use way. In way, it does for frontend developers what Ruby on Rails did for backend devel opers: hide many recurring problems that applications have behind a common abstraction in order to allow application builders to concentrate on what makes their apps special and not on re-inventing the millionth password-reset system.

One of the core features of Hoodie is that it allows applications to function offline. That is, all end-user interaction only occurs with a client-side, in-browser database. Communication with a server happens asynchronously and only when an internet connection is available and fast. This communication system is just CouchDB’s replication mechanism, wrapped in a nice API. Application devel opers don’t need to worry about the details. Hoodie chose CouchDB for the backend solution specifically because it enables this Offline First design for applications.

While the web-version of Hoodie is furthest along, there are also ongoing efforts to port the frontend bits to iOS and Android for use in native apps there.

The Hoodie developers believe that Offline First application design is going to allow application developers to build superior user interaction experiences in the face of the slow rollout of mobile broadband, network contention in densely populated areas or architecture that works as a faraday cage.

While the technology exists to solve these problems , application developers are slow to adopt them, because it requires a rather massive re-thinking of how they build their apps and that’s why Hoodie tries to reach user experience experts, designers and frontend developers who would otherwise couldn’t build a full application and let Hoodie care for all their backend needs, while giving them Offline First apps for free.

Conclusion

CouchDB is different from all other databases, but it borrows a good number of well known and well understood paradigms to provide a unique feature set to its users. Core data storage in JSON and over HTTP make for a flexible and modern data store with a flexible query engine.

Replication allows data to live where it is needed, whether it is in a cluster spanning a number of data-centers or on the phone of an end-user.

Designed around data-safety, CouchDB provides a very efficient data storage solution that is robust in the face of system faults, networking faults as well sudden spikes in traffic. It is developed as an independent Open Source project under a liberal license.

About the Author

Jan Lehnardt is the Vice President of Apache CouchDB at The Apache Software foundation. He is the longest standing contributor and started working on CouchDB in 2007. He’s the co-author of CouchDB: The Definitive Guide. He lives in Berlin and likes cycling, drumming and perfecting the craft of making pizza in his wood-burning oven. Apache CouchDB is developed by a dedicated team of contributors at the Apache Software Foundation. Follow him: @janl . Reach him at jan@apache.org