Blog dedicated to Elasticsearch Server Books series

ElasticSearch 0.90.1: Updates in bulk API

Send to Kindle

As the 0.90.0 ElasticSearch is released it is time to look at the features that will became part of the incoming 0.90.1 and the big 1.0 release. The first thing that we payed attention to is the possibility of including partial document updates in the Bulk API request. So in addition to the standard index and delete command the ElasticSearch 0.90.1 will introduce the update one.

Example Data

Let’s use the simplified version of the data that was used when we’ve looked at the Rescore functionality (we store it in the bulk.json file):

We’ll use the example data to illustrate how the updates in bulk requests can be use, but for now let’s use the following command to index the above data:

$curl -s -XPOST 'localhost:9200/_bulk' --data-binary @bulk.json

Updating documents

Let’s assume that our book documents are changed from time to time in our library and till now we managed it using the Update API. However it was done one by one and it was taking some time. With the new bulk API extension in 0.90.1 we were able to prepare the following bulk request (we store it in the bulk_update.json file):

Similar to the standard Bulk API each indexing (update in our case) is built of two lines. The first line is responsible for telling ElasticSearch what type of operation will be performed (update in our case) and which document should be updated – which index it belongs to, which type it has and its identifier.

The second line carries the information about the fields we want to add to our documents. In our case we want to add the field named updated with the value of true to the documents with identifiers 2 and 3. In order to do that we send the following command:

As you can see the _version of the document was increased and the _source contains the newly introduced field.

Updating document fields

What if we would like to update document fields ? That is also possible in the same way as when using the Update API – with the use of script. For example let’s update the available field and let’s set it to true for the above updated documents. We can do it by sending a bulk request with the following contents:

Again, the _version of the document was increased and the _source contains the updated available field.

Upsert

Just like with the standard Update API, if a document we want to update doesn’t exists we can use the upsert object to create it. If we would like ElasticSearch to include title field named Mastering ElasticSearch to a document with the id of 5 we would send a bulk request with the following contents:

Retrying on conflicts

In addition to all that we can add the _retry_on_conflict property to the bulk request line that is responsible for specifying the identifier, index and type of the document for ElasticSearch to retry failed indexing. For example, look at the following bulk request contents: