CRUD Document Operations Using the Java SDK with Couchbase Server

You can access documents in Couchbase using methods of the couchbase.couchbase.client.java.Bucket object.

The methods for retrieving documents are get() and lookupIn() and the methods for mutating documents are upsert(), insert(), replace() and mutateIn().

Examples are shown using the synchronous API.
See the section on Async Programming for other APIs.

Additional Options

Update operations also accept a TTL (expiry) value (expiry) on the passed document which will instruct the server to delete the document after a given amount of time.
This option is useful for transient data (such as sessions).
By default documents do not expire.
See Expiration Overview for more information on expiration.

Update operations can also accept a CAS (cas) value on the passed document to protect against concurrent updates to the same document.
See CAS for a description on how to use CAS values in your application.
Since CAS values are opaque, they are normally retreived when a Document is loaded from Couchbase and then used subsequently (without modification) on the mutation operations.
If a mutation did succeed, the returned Document will contain the new CAS value.

Document Input and Output Types

Couchbase stores documents.
From an SDK point of view, those documents contain the actual value (like a JSON object) and associated metadata.
Every document in the Java SDK contains the following properties, some of them optional depending on the context:

Name

Description

id

The (per bucket) unique identifier of the document.

content

The actual content of the document.

cas

The CAS (Compare And Swap) value of the document.

expiry

The expiration time of the document.

mutationToken

The optional MutationToken after a mutation.

There are a few different implementations of a Document.
Here are a few noteworthy document types:

JsonDocument: The default one in most methods, contains a JSON object (as a JsonObject).

RawJsonDocument: Represents any JSON value, stored as a String (useful for when you have your own JSON serializer/deserializer).

BinaryDocument: Used to store pure raw binary data (as a ByteBuf from Netty).

Because Couchbase Server can store anything and not just JSON files, many document types exist to satisfy the general needs of an application.
You can also write your own Document implementations, which is not covered in this introduction.

Creating and Updating Full Documents

Documents may be created and updated using the Bucket#upsert(), Bucket#insert(), and Bucket#replace() family of methods.
Read more about the difference between these methods at Primitive Key-Value Operations in the Couchbase developer guide.

These methods accept a Document instance where the following values are considered if set:

id (mandatory): The ID of the document to modify (String).

content (mandatory): The desired new content of the document, this varies per document type used.
If the JsonDocument is used, the document type is a JsonObject.

expiry (optional): Specify the expiry time for the document.
If specified, the document will expire and no longer exist after the given number of seconds.
See Expiration Overview for more information.

cas (optional): The CAS value for the document.
If the CAS on the server does not match the CAS supplied to the method, the operation will fail with a CASMismatchException.
See Concurrent Document Mutations for more information on the usage of CAS values.

Retrieving full documents

You can retrieve documents using the Bucket#get(), Bucket#getAndLock(), Bucket#getAndTouch() and Bucket#getFromReplica()methods.
All of those serve different distinct purposes and accept different parameters.

Most of the time you use the get() method.
It accepts one mandatory argument:

It is also possible to read from a replica if you want to explicitly trade availability for consistency during the timeframe when the active partition is not reachable (for example during a node failure or netsplit).

getFromReplica has one mandatory argument as well:

id: The document ID to retrieve

Since you can have 0 to 3 replicas (and they can change at runtime of your application) the getFromReplica returns Lists or Iterators.
It is recommended to use the Iterator APIs since they provide more flexibility during error conditions (since only partial responses may be retreived).

In general, always use the ReplicaMode.ALL option and not ReplicaMode.FIRST and similar to just get the first replica.
The reason is that is that ALL will also try the active node, leading to more reliable behavior during failover.
If you just need the first replica use the iterator approach and break; once you have enough data from the replicas.

Since a replica is updated asynchronously and eventually consistent, reading from it may return stale and/or outdated results!

If you need to use pessimistic write locking on a document you can use the getAndLock which will retreive the document if it exists and also return its CAS value.
You need to provide a time that the document is maximum locked (and the server will unlock it then) if you don’t update it with the valid cas.
Also note that this is a pure write lock, reading is still allowed.

// Get and Lock for max of 10 seconds
JsonDocument ownedDoc = bucket.getAndLock("document_id", 10);
// Do something with your document
JsonDocument modifiedDoc = modifyDocument(ownedDoc);
// Write it back with the correct CAS
bucket.replace(modifiedDoc);

If the document is locked already and you are trying to lock it again you will receive a TemporaryLockFailureException.

It is also possible to fetch the document and reset its expiration value at the same time.
See Modifying Expiration for more information.

Removing full documents

You can remove documents using the Bucket.remove() method.
This method takes a single mandatory argument:

You may also use the Bucket#touch() method to modify expiration without fetching or modifying the document:

bucket.touch("expires", 2);

Atomic Document Modifications

Additional atomic document modifications can be performing using the Java SDK.
You can modify a counter document using the Bucket.counter() method.
You can also use the Bucket.append() and Bucket.prepend() methods to perform raw byte concatenation.

Batching Operations

Since the Java SDK uses RxJava as its asynchronous foundation, all operations can be batched in the SDK using the asynchronous API via bucket.async() (and optionally revert back to blocking).

For implicit batching use these operators: Observable.just() or Observable.from() to generate an observable that contains the data you want to batch on.
flatMap() to send those events against the Couchbase Java SDK and merge the results asynchronously.
last() if you want to wait until the last element of the batch is received.
toList() if you care about the responses and want to aggregate them easily.
If you have more than one subscriber, use cache() to prevent accessing the network over and over again with every subscribe.

The following example creates an observable stream of 6 keys to load in a batch, asynchronously fires off get() requests against the SDK (notice the bucket.async().get(...)), waits until the last result has arrived, and then converts the result into a list and blocks at the very end.
This pattern can be reused for mutations like upsert (as shown further down):

Note that this always returns a list, but it may contain 0 to 6 documents (here 5) depending on how many are actually found.
Also, at the very end the observable is converted into a blocking one, but everything before that, including the network calls and the aggregation, is happening completely asynchronously.

Inside the SDK, this provides much more efficient resource utilization because the requests are very quickly stored in the internal Request RingBuffer and the I/O threads are able to pick batches as large as they can.
Afterward, whatever server returns a result first it is stored in the list, so there is no serialization of responses going on.

Batching mutations: The previous Java SDK only provided bulk operations for get().
With the techniques shown above, you can perform any kind of operation as a batch operation.
The following code generates a number of fake documents and inserts them in one batch.
Note that you can decide to either collect the results with toList() as shown above or just use last() as shown here to wait until the last document is properly inserted:

Operating with Sub-Documents

Sub-document operations save network bandwidth by allowing you to specify paths of a document to be retrieved or updated.
The document is parsed on the server and only the relevant sections (indicated by paths) are transferred between client and server.
You can execute sub-document operations in the Java SDK using the Bucket#lookupIn() and Bucket#mutateIn() methods.

Each of these methods accepts a key as its mandatory first argument and give you a builder that you can use to chain several command specifications, each specifying the path to be impacted by the specified operation and a document field operand.
You may find all the operations in the LookupInBuilder and MutateInBuilder classes.

All sub-document operations return a special DocumentFragment object rather than a Document.
It shares the id(), cas() and mutationToken() fields of a document, but in contrast with a normal Document object, a DocumentFragment object contains multiple results with multiple statuses, one result/status pair for every input operation.
So it exposes method to get the content() and status() of each spec, either by index or by path.
It also allows to check that a response for a particular spec exists():

Using the content(...) methods will raise an exception if the individual spec did not complete successfully.
You can also use the status(...) methods to return an error code (a ResponseStatus) rather than throw an exception.

Uses the Transcoder from the 1.x SDKs and can be used for full cross-compatibility between the old and new versions.

StringDocument

Can be used to store arbitrary strings.
They will not be quoted, but stored as-is and flagged as "String".

You can implement a custom document type and associated transcoder if none of the pre-configured options are suitable for your application.
A custom transcoder converts intputs to their serialized forms, and deserializes encoded data based on the item flags.
There is an AbstractTranscoder that can serve as the basis for a custom implementation, and custom transcoders should be registered with a Bucket when calling Cluster#openBucket (a list of custom transcoders can be passed in one of the overloads).

Correctly Managing BinaryDocuments

The BinaryDocument can be used to store and read arbitrary bytes.
It is the only default codec that directly exposes the underlying low-level Netty ByteBuf objects.

Because the raw data is exposed, it is important to free it after it has been properly used.
Not freeing it will result in increased garbage collection and memory leaks and should be avoided by all means.
See Correctly Managing Buffers.

Because binary data is arbitrary anyway, it is backward compatible with the old SDK regarding flags so that it can be read and written back and forth.
Make sure it is not compressed in the old SDK and that the same encoding and decoding process is used on the application side to avoid data corruption.

Here is some demo code that shows how to write and read raw data.
The example writes binary data, reads it back, and then frees the pooled resources:

Correctly Managing Buffers

BinaryDocument allows users to get the rawest form of data out of Couchbase.
It exposes Netty’s ByteBuf, byte buffers that can have various characteristics (on- or off-heap, pooled or unpooled).
In general, buffers created by the SDK are pooled and off heap.
You can disable the pooling in the CouchbaseEnvironment if you absolutely need that.

As a consequence, the memory associated with the ByteBuf must be a little bit more managed by the developer than usual in Java.

Most notably, these byte buffers are reference counted, and you need to know three main methods associated to buffer management:

refCnt() gives you the current reference count.
When it hits 0, the buffer is released back to its original pool, and it cannot be used anymore.

release() will decrease the reference count by 1 (by default).

retain() is the inverse of release, allowing you to prepare for multiple consumptions by external methods that you know will each release the buffer.

You can also use ReferenceCountUtil.release(something) if you don’t want to check if something is actually a ByteBuf (will do nothing if it’s not something that is ReferenceCounted).

The SDK bundles the Netty dependency into a different package so that it doesn’t clash with a dependency to another version of Netty you may have.
As such, you need to use the classes and packages provided by the SDK (com.couchbase.client.deps.io.netty) when interacting with the API.
For example, the ByteBuf for the content of a BinaryDocument is a com.couchbase.client.deps.io.netty.buffer.ByteBuf.

What happens if I don’t release?

Basically, you leak memory... Netty will by default inspect a small percentage of ByteBuf creations and usage to try and detect leaks (in which case it will output a log, look for the "LEAK" keyword).

You can tune that to be more eagerly monitoring all buffers by calling ResourceLeakDetector.setLevel(PARANOID).

Note that this incurs quite an overhead and should only be activated in tests.
In production (prod), setting it to ADVANCED is not as heavy as paranoid and can be a good middle ground.

What happens if I release twice (or the SDK releases once more after I do)?

Netty will throw IllegalReferenceCountException.
The buffer that has RefCnt = 0 cannot be interacted with anymore since it means it has been freed back into the pool.

When must I release?

When the SDK creates a BinaryDocument for you, basically GET-type operations.

Mutative operations, on the other hand, will take care of the buffer you pass in for you, at the time the buffer is written on the wire.

When must I usually retain?

When you do a write, the buffer will usually be released by the SDK calling release().
But if you implement a kind of fallback behavior (for instance attempt to insert() a doc, catch DocumentAlreadyExistException and then fallback to an update() instead), that means the SDK would attempt to release twice, which won’t work.

In this case you can retain() the buffer before the first attempt, let the catch block do the extra release if something goes wrong.
You have to manage the extra release if the first write succeeds, and think about catching other possible exceptions (here also an extra release is needed):