Tech

23 Aug

The ElasticSearch cat APIs

I like ElasticSearch, it’s a great piece of open source technology. Although it was built as a Lucene based search engine, it can do more than just that. It’s an awesome analytics engine, but it’s also a pretty good NoSQL database.

Interacting with ElasticSearch happens through the REST API and the output is JSON. JSON is cool, JSON is fun, but it’s not really made for human readable output.

Cat? Are there animals involved?

Not really … the ElasticSearch cat APIs are not related to the feline creatures. The API refers to the cat binary in Unix. Instead of outputting JSON, the cat APIs sends it output line by line. No parsing required: new items are separated by a new line, properties of an item by a space.

Makes sense right?

Calling them

Calling them is quite easy: you just issue a GET request to the “_cat” resource of your ElasticSearch server. This could look like this when using curl:

Pick one and dig deeper?

The cat API documentation is pretty extensive. And I could just quote the docs line by line. That wouldn’t be to useful. Instead I’ll pick one and explain why and how I use it.

The “health” API is the most important one to me. If the cluster is not healthy, searches will not return consistent data sets. Based on the health status the cluster could either be:

Green: it’s all good man. Saul Goodman 😉 The nodes are up, the shards for each index are loaded and the replicas have been recovered on a separate nodes

Yellow: something is wrong. Not all replicas have been recovered. If a node goes down, there can be data loss

Red: some primary data shards are missing. This means there data loss right now. This is bad but not disastrous as some nodes might still be rebooting.

You can actually call a specific health call from your monitoring system:

curl "http://localhost:9200/_cat/health?h=status"

If the output is not “green”, engineers should be alerted. Very convenient!

Let’s look at some video footage

I recorded a short video where I feature a couple of cat APIs on a 3 node cluster. The cluster runs on my laptop.

What I’m doing in this video is showing random API calls that are focused on the cluster, the nodes in the cluster and the indices running on the cluster.

I’m creating an index called “myindex” with a type called “mytype”. At first the index is empty, then I’m adding a document, then another one. Using the API calls I’m checking the size of the index, the allocation of the shards in the cluster and the cluster state.

Have a look:

Why should you use the cat APIs?

Long story short: the cat APIs are the easiest way to manage an ElasticSearch cluster.

OK, you can’t really change anything using these APIs, but at least you get a very detailed view on the current status of “things”. And these things could vary.