Creating a content store with Couchbase - The Learning Portal

Marty Schoch
of
Couchbase
Published
October 8, 2012

Two weeks ago McGraw Hill presented at CouchConf SF and our users expressed so much interest that I thought I'd share more details in a blog. Earlier this year McGraw Hill and Couchbase teamed up to build a proof-of-concept application showing off the power of using Couchbase and ElasticSearch together.

The goal of the project was to build a self-adapting learning portal that delivers personalized results. Specifically that meant:

Allow users to browse and search a variety of content (articles, images, and video)

Architecture

Couchbase Server was used to store all of the content meta-data, as well as the full-text source of the text articles. This gives the application sub-millisecond latency access to the primary data set.

ElasticSearch was chosen to handle the full-text search requirements for the application. ElasticSearch combines rich querying capabilities with excellent clustering capabilities, making it a great match with Couchbase. Integration between Couchbase Server and ElasticSearch was provided by the Couchbase Transport plug-in. This transport uses the Cross Data Center Replication feature of Couchbase Server 2.0 to reliably transfer all document mutations to the ElasticSearch index (Learn more about this here).

On the front-end, the decision was made to build the application using Ruby on Rails. Our primary objective in the code being to clearly document the best practices when using Couchbase and ElasticSearch together.

Learning Portal

Here is what a user sees when they first log in to the application.

Fast Access to Documents using Couchbase Client SDK

When a user selects on a particular piece of content, the data is loaded directly from Couchbase Server by its key. Here's a sample document in Couchbase:

When we access this view with a group_level of 1, we see each tag, and the number of times it has been used to describe a document.

Couchbase Views are sorted by the key, so we cannot directly query for the top 8 tags. Instead, we have a job that runs every 10 minutes, queries this view, sorts the results, and stores the top 8 results into another document in Couchbase. Here is what that document looks like:

The important thing to note here is that the full document body is not included in the response from ElasticSearch. This was done by design, as we configured the index to not store the full source documents. The reason is simple, we already have fast access to the documents in Couchbase. Using the Couchbase Client SDK, we can perform a multi-get operation and efficiently pull down the document bodies. This allows us to render the search results screen: