Navigation

This is a collection of some common SQL queries and how to get the same result
in CouchDB. The key to remember here is that CouchDB does not work like an SQL
database at all and that best practices from the SQL world do not translate well
or at all to CouchDB. This documents’s “cookbook” assumes that you are familiar
with the CouchDB basics such as creating and updating databases and documents.

Using views is a two-step process. First you define a view; then you query it.
This is analogous to defining a table structure (with indexes) using
CREATETABLE or ALTERTABLE and querying it using an SQL query.

Defining a view is done by creating a special document in a CouchDB database.
The only real specialness is the _id of the document, which starts with
_design/ — for example, _design/application. Other than that, it is just a
regular CouchDB document. To make sure CouchDB understands that you are defining
a view, you need to prepare the contents of that design document in a special
format. Here is an example:

We are defining a view viewname. The definition of the view consists of two
functions: the map function and the reduce function. Specifying a reduce
function is optional. We’ll look at the nature of the functions later. Note that
viewname can be whatever you like: users, by-name, or by-date are
just some examples.

A single design document can also include multiple view definitions, each
identified by a unique name:

The name of the design document and the name of the view are significant for
querying the view. To query the view viewname, you perform an HTTP GET
request to the following URI:

/database/_design/application/_view/viewname

database is the name of the database you created your design document in. Next
up is the design document name, and then the view name prefixed with _view/.
To query anotherview, replace viewname in that URI with anotherview.
If you want to query a view in a different design document, adjust the design
document name.

MapReduce is a concept that solves problems by applying a two-step process,
aptly named the map phase and the reduce phase. The map phase looks at all
documents in CouchDB separately one after the other and creates a map result.
The map result is an ordered list of key/value pairs. Both key and value can
be specified by the user writing the map function. A map function may call the
built-in emit(key,value) function 0 to N times per document, creating a row
in the map result per invocation.

CouchDB is smart enough to run a map function only once for every document, even
on subsequent queries on a view. Only changes to documents or new documents need
to be processed anew.

Map functions run in isolation for every document. They can’t modify the
document, and they can’t talk to the outside world—they can’t have side effects.
This is required so that CouchDB can guarantee correct results without having
to recalculate a complete result when only one document gets changed.

It is a list of rows sorted by the value of key. The id is added automatically
and refers back to the document that created this row. The value is the data
you’re looking for. For example purposes, it’s the girl’s age.

The map function that produces this result is:

function(doc){if(doc.name&&doc.age){emit(doc.name,doc.age);}}

It includes the if statement as a sanity check to ensure that we’re operating
on the right fields and calls the emit function with the name and age as the key
and value.

Use case: get a result (which can be a record or set of records) associated
with a key (“searchterm”).

To look something up quickly, regardless of the storage mechanism, an index is
needed. An index is a data structure optimized for quick search and retrieval.
CouchDB’s map result is stored in such an index, which happens to be a B+ tree.

To look up a value by “searchterm”, we need to put all values into the key of a
view. All we need is a simple map function:

function(doc){if(doc.value){emit(doc.value,null);}}

This creates a list of documents that have a value field sorted by the data in
the value field. To find all the records that match “searchterm”, we query the
view and specify the search term as a query parameter:

/database/_design/application/_view/viewname?key="searchterm"

Consider the documents from the previous section, and say we’re indexing on the
age field of the documents to find all the five-year-olds:

Note that you have to emit a value. The view result includes the associated
document ID in every row. We can use it to look up more data from the document
itself. We can also use the ?include_docs=true parameter to have CouchDB
fetch the documents individually for us.

Use case: find all documents that have a field value that starts with
searchterm. For example, say you stored a MIME type (like text/html or
image/jpg) for each document and now you want to find all documents that are
images according to the MIME type.

The solution is very similar to the previous example: all we need is a map
function that is a little more clever than the first one. But first, an example
document:

The clue lies in extracting the prefix that we want to search for from our
document and putting it into our view index. We use a regular expression to
match our prefix:

function(doc){if(doc["mime-type"]){// from the start (^) match everything that is not a slash ([^\/]+) until// we find a slash (\/). Slashes needs to be escaped with a backslash (\/)varprefix=doc["mime-type"].match(/^[^\/]+\//);if(prefix){emit(prefix,null);}}}

We can now query this view with our desired MIME type prefix and not only find
all images, but also text, video, and all other formats:

We haven’t explained reduce functions yet. Reduce functions are similar to
aggregate functions in SQL. They compute a value over multiple documents.

To explain the mechanics of reduce functions, we’ll create one that doesn’t make
a whole lot of sense. But this example is easy to understand. We’ll explore more
useful reductions later.

Reduce functions operate on the output of the map function (also called the map
result or intermediate result). The reduce function’s job, unsurprisingly, is to
reduce the list that the map function produces.

This reduce function takes two arguments: a list of keys and a list of values.
For our summing purposes we can ignore the keys-list and consider only the value
list. We’re looping over the list and add each item to a running total that
we’re returning at the end of the function.

You’ll see one difference between the map and the reduce function. The map
function uses emit() to create its result, whereas the reduce function
returns a value.

For example, from a list of integer values that specify the age, calculate the
sum of all years of life for the news headline,
“786 life years present at event.” A little contrived, but very simple and
thus good for demonstration purposes. Consider the documents and the map view we
used earlier in this document.

The reduce function to calculate the total age of all girls is:

function(keys,values){returnsum(values);}

Note that, instead of the two earlier versions, we use CouchDB’s predefined
sum() function. It does the same thing as the other two, but it is such
a common piece of code that CouchDB has it included.

The result for our reduce view now looks like this:

{"rows":[{"key":null,"value":15}]}

The total sum of all age fields in all our documents is 15. Just what we wanted.
The key member of the result object is null, as we can’t know anymore which
documents took part in the creation of the reduced result. We’ll cover more
advanced reduce cases later on.

As a rule of thumb, the reduce function should reduce to a single scalar value.
That is, an integer; a string; or a small, fixed-size list or object that
includes an aggregated value (or values) from the values argument.
It should never just return values or similar. CouchDB will give you a warning
if you try to use reduce “the wrong way”:

{"error":"reduce_overflow_error","message":"Reduce output must shrink more rapidly: Current output: ..."}

Getting unique values is not as easy as adding a keyword. But a reduce view and
a special query parameter give us the same result. Let’s say you want a list of
tags that your users have tagged themselves with and no duplicates.

First, let’s look at the source documents. We punt on _id and _rev
attributes here:

As promised, these are all the tags, including duplicates. Since each document
gets run through the map function in isolation, it cannot know if the same key
has been emitted already. At this stage, we need to live with that. To achieve
uniqueness, we need a reduce:

function(keys,values){returntrue;}

This reduce doesn’t do anything, but it allows us to specify a special query
parameter when querying the view:

In this case, we can ignore the value part because it is always true, but the
result includes a list of all our tags and no duplicates!

With a small change we can put the reduce to good use, too. Let’s see how many
of the non-unique tags are there for each tag. To calculate the tag frequency,
we just use the summing up we already learned about. In the map function,
we emit a 1 instead of null:

Use case: your applications require that a certain value exists only once in a
database.

This is an easy one: within a CouchDB database, each document must have a
unique _id field. If you require unique values in a database, just assign
them to a document’s _id field and CouchDB will enforce uniqueness for you.

There’s one caveat, though: in the distributed case, when you are running more
than one CouchDB node that accepts write requests, uniqueness can be guaranteed
only per node or outside of CouchDB. CouchDB will allow two identical IDs to be
written to two different nodes. On replication, CouchDB will detect a conflict
and flag the document accordingly.