The Technology Behind Couchbase

In January, we launched Couchbase Server 1.8, our high-performance and scalable database system. This will form the core of our forthcoming Couchbase Server 2.0, currently in developer preview, which combines high-performance key-value store with indexing and querying. Here’s an overview of the the technology that underpins the functionality of the two products.

What is Couchbase Server?

Couchbase Server is a database solution that builds on a number of different components to provide a persistent, scalable and high-performance database. This is achieved by keeping most of the data in RAM, allowing the information to be distributed to additional nodes in the cluster. If you need more storage space or performance you can add a new node, rebalance the data, and get near-linear scaling capability.

Origins

Couchbase Server has a core that is based around memcached. Memcached is a completely RAM-based caching engine that has traditionally been used as the final caching layer for information loaded and processed from a database to help improve application performance. Your typical usage involves loading the data from your database (often from multiple tables), then applying your business and other logic and storing it in memcached. Couchbase Server builds on this by supporting the same in-memory architecture of memcached, but with the addition of persistence to disk and the ability to more easily distribute and scale the information store to improve performance.

Scaling

In memcached installations you make use of the client-side server selection using a hash on the key and a list of servers to determine where the client should be sending the request to get the key/value data. Within Couchbase Server the distribution of information across servers is handled by an additional layer of abstraction called vbuckets. This acts as a plug-in to the original memcached and supports a multi-tenant architecture, allowing data to spread across multiple servers automatically. Adding a server into your Couchbase Server cluster then uses a process called rebalancing. The overall server management architecture of Couchbase Server is supported by the ns_server component, written entirely in Erlang. Erlang has an incredibly powerful threading and interprocess communication architecture making it ideal as the overall management environment for Couchbase, which has to handle multiple clients, communication channels and interactions with the other components.

Persistence

Although memcached is a popular solution, it has a major problem if a server fails for some reason. You completely lose any data stored. This isn’t a problem when you are using it as a caching layer, because you can reload from your database solution, but you will take a performance hit doing so. In Couchbase Server, data stored in RAM is also persisted down to disk. By saving the information, Couchbase Server not only allows the dataset to grow beyond the size of RAM, it also means that servers can be rebooted and then reload their dataset without having to be manually re-populated with dates. Most important of all though is that with persistence, Couchbase Server can store your data, rather than just acting as a cache.

Indexing and Querying

In Couchbase Server 2.0 the SQLite backend has been replaced by the storage mechanics and the indexing and querying engine originally produced as part of the Apache CouchDB project. Apache CouchDB is written entirely in Erlang, and like Couchbase Server it makes use of the Mochiweb component for the REST API interface. In Couchbase Server, an extended memcached protocol is used as the data manipulation interface for the database create/read/update/delete operations, with the REST API exposed for both querying and administration.

Couchbase Server 2.0 adds the ability to more effectively query and index the documents that have been stored. To query the data, a View is created that parses each stored JSON document into an index, and the index can then be queried and searched. Views are built by using an embedded Spidermonkey Javascript engine to define the views. JavaScript is a powerful language, but with a simple syntax, making it ideal to write view definitions and make them understandable.

Monitoring

The REST architecture makes scripting and extending Couchbase Server easy from any language. In fact, we use REST in the standard command-line tools and in the web-based user interface. The user interface also exposes the built-in statistics and monitoring information, making heavy use of the jQuery Javascript library to support the basics of the interface and the live graphing interface. This gives you instant views on the state and health of your cluster and individual nodes.

Most of this functionality would be impossible to build and develop without us collectively using Github as a distributed code versioning platform, and Gerrit, the code review system from Google that enables us to review code submissions. As an open source project with a distributed team these tools are a vital part of the software that helps make Couchbase Server what it is.

About the Author

A professional writer for over 15 years, Martin ‘MC’ Brown is the author and contributor to over 26 books covering an array of topics, including the recently published Getting Started with CouchDB. His expertise spans myriad development languages and platforms Perl, Python, Java, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows, Solaris, Linux, BeOS, Microsoft WP, Mac OS and more. He is a former LAMP Technologies Editor for LinuxWorld magazine and is a regular contributor to ServerWatch.com, LinuxPlanet, ComputerWorld and IBM developerWorks. As a Subject Matter Expert for Microsoft he provided technical input to their Windows Server and certification teams. He draws on a rich and varied background as founder member of a leading UK ISP, systems manager and IT consultant for an advertising agency and Internet solutions group, technical specialist for an intercontinental ISP network, and database designer and programmer and as a self-confessed compulsive consumer of computing hardware and software. MC is currently the VP of Technical Publications and Education for Couchbase and is responsible for all published documentation, training program and content, and the Couchbase Techzone and can be reached at mcslp.net.