Elasticsearch Interpreter for Apache Zeppelin

Overview

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements.

Configuration

Property

Default

Description

elasticsearch.cluster.name

elasticsearch

Cluster name

elasticsearch.host

localhost

Host of a node in the cluster

elasticsearch.port

9300

Connection port ( Important: it depends on the client type, transport or http)

elasticsearch.client.type

transport

The type of client for Elasticsearch (transport or http)( Important: the port depends on this value)

elasticsearch.basicauth.username

Username for a basic authentication (http)

elasticsearch.basicauth.password

Password for a basic authentication (http)

elasticsearch.result.size

10

The size of the result set of a search query

Note #1 : You can add more properties to configure the Elasticsearch client.

Note #2 : If you use Shield, you can add a property named shield.user with a value containing the name and the password ( format: username:password ). For more details about Shield configuration, consult the Shield reference guide. Do not forget, to copy the shield client jar in the interpreter directory (ZEPPELIN_HOME/interpreters/elasticsearch).

Enabling the Elasticsearch Interpreter

In a notebook, to enable the Elasticsearch interpreter, click the Gear icon and select Elasticsearch.

Using the Elasticsearch Interpreter

In a paragraph, use %elasticsearch to select the Elasticsearch interpreter and then input all commands. To get the list of available commands, use help.

%elasticsearch
help
Elasticsearch interpreter:
General format: <command> /<indices>/<types>/<id> <option> <JSON>
- indices: list of indices separated by commas (depends on the command)
- types: list of document types separated by commas (depends on the command)
Commands:
- search /indices/types <query>
. indices and types can be omitted (at least, you have to provide '/')
. a query is either a JSON-formatted query, nor a lucene query
- size <value>
. defines the size of the result set(default value is in the config)
. if used, this command must be declared before a search command
- count /indices/types <query>
. same comments as for the search
- get /index/type/id
- delete /index/type/id
- index /index/type/id <json-formatted document>
. the id can be omitted, elasticsearch will generate one

Tip : Use ( Ctrl + . ) for autocompletion.

Get

With the get command, you can find a document by id. The result is a JSON document.

%elasticsearch
get /index/type/id

Example:

Search

With the search command, you can send a search query to Elasticsearch. There are two formats of query:

You can provide a JSON-formatted query, that is exactly what you provide when you use the REST API of Elasticsearch.

With a JSON query containing a fields parameter (for filtering the fields in the response): in this case, all the fields values in the response are arrays, so, after flattening the result, the format of all the field names is field_name[x]

With a query string:

With a query containing a multi-value metric aggregation:

With a query containing a multi-bucket aggregation:

Count

With the count command, you can count documents available in some indices and types. You can also provide a query.