Couchbase Java SDK 1.4.0 - New and Noteworthy

About the Author

Michael Nitschinger is a JVM engineer at Couchbase. He is the architect and maintainer of the Couchbase Java SDK, one of the first completely reactive database drivers on the JVM. He also authored and maintains the Couchbase Spark Connector. Michael is active in the open source community, a core member of the Netty project, and also contributes to various other projects like RxJava.

Tags

Via this blog we're releasing the first developer preview of the 1.4.0 Java SDK. Aside from the usual bugfixes and enhancements, this new minor release provides support for optimized connection management which was recently introduced in Couchbase Server 2.5.0. See below for more information on what's new here.

You can download the preview either from Maven Central or a zip archive with all JARs included.

Update: The developer preview has been refreshed to 1.4.0dp2 as of April 4, 2014.

Optimized Connection Management

Couchbase Server 2.5.0 introduced a new way of fetching a cluster configuration a couple of weeks ago. In addition to the previous way of loading it over port 8091 (http), it is now possible for the SDK to load it directly through the underlying binary protocol (port 11210). Previously, to keep track of ongoing cluster changes, the client had to establish a streaming connection to the configuration port (which did push new config chunks to the client). Now, the client receives new configurations along with data operation responses over the binary port. This makes bootstrap much faster and more efficient, making it easier to manage large deployments.

When using the Java SDK, there is nothing that needs to be changed API wise, but since the bootstrap process changes slightly, its good to understand what is actually going on. For a general introduction into that topic, I recommend Mark Nunberg's blog post on the similar changes to libcouchbase, which covers lots of the surrounding bits and pieces.

The Java SDK takes the list of bootstrap nodes passed in, but ignores everything from the URI aside the hostname for now. It tries to contact the target server on port 11210. If the server responds with a valid configuration (which happens if it is a 2.5.0 or later node and a couchbase bucket), this configuration gets immediately stored and used. No streaming connection is attached, but this binary connection is reused to fetch configuration updates on demand as needed. If the server doesn't respond with a valid configuration, all of the other nodes in the bootstrap list are tried with the same behaviour. If none of them return a valid config (for example if a memcache bucket is used or all of the nodes in the cluster are version 2.2 or older), the client falls back to the (old) HTTP type bootstrap and streaming connection. While this process changes the behavior and doesn't seem to match what the arguments imply is happening, we think this is the right thing to do for compatibility purposes with existing applications (though we're open to feedback).

The same process is established when the configuration connection gets lost or other parts of the SDK indicate that the configuration connection is outdated (for example a high amount of failing operations).

INFO-level logging has been added so that it is visible from the logs which bootstrap approach is used. If the new connection management facilities did work, this log message is shown:

Since this change is a slightly larger one inside the SDK, please kick the tires on the developer preview (that's mainly why we opted in to do a preview instead going directly to a final release) and give us feedback on various scenarios in your environment. Our test team is also running it through its paces.

Total numbers of records on the ViewResponse

Every non-reduced view exposes the total number of rows in the view and this is now reflected also in the Java SDK. This is especially useful in pagination and unit testing scenarios. Here is an example (from the beer-sample dataset):

This has been a long-standing user request and is now in this release.

Enhanced replica read capabilities

Another user request - since we added replica-read capabilities - was that there should also be a way to retreive the CAS value from the replica node. This basically resembles the well known "gets" command, but this time for replicas. We added the capability through the asyncGetsFromReplica and getsFromReplica commands. Here is an example on how to utilize the new methods:

Keep in mind that the semantics are the same as with "getFromReplica", so that the returned value with CAS could either be from the master or one of the replica nodes, depending on who responded first. This CAS value can be used for subsequent write commands that need to include the CAS value for optimistic locking.

Typesafe status codes on OperationStatus

In the past, it has always been a bit of a hassle to deal with future operation status response strings. There was no good way to deal with them other than checking strings. This minor release brings StatusCodes to the OperationStatus objects, which allows you to simply check against an ENUM.

The StatusCode provides all possible status codes that can be returned, and before the final release we'll also provide proper documentation when they can occur (so you know what to look for when checking the codes).

Next Steps

We've decided to do a developer preview for this minor release because we want to make sure that the new optimized connection management facilities are battle tested before a final release. Please kick the tires and report any issues you find on our issue tracker. As soon as our test team has given us the green light and if there are no concerns from DP users, we'll release it as GA!