Tag: Base64

Elasticsearch allows to index file attachments using the mapper attachments plugin. This plugin uses the text extraction library Apache Tika. It supports different types of file format, you can find the list here.

To install the plugin you need to run this on all the nodes of your Elasticsearch cluster (each node must be restarted after installation):

1

sudo bin/elasticsearch-plugin install mapper-attachments

The plugin adds the attachment type when mapping properties so that documents can be populated with file attachment contents (encoded as base64).

NB: The mapper-attachments plugin will be replaced by the ingest-attachment plugin in the 5.x version of Elasticsearch.

I am using the 2.3.3 version of Elasticsearch on Ubuntu 14.04.4 LTS.

We are going now to create a new index called library and will add a new type (book) and index some documents.
To create a new index:

1

2

3

4

5

6

7

8

curl-XPUT'192.168.193.130:9200/library/'-d'{

"settings" : {

"index" : {

"number_of_shards" : 1,

"number_of_replicas" : 0

}

}

}'

Then we add the mapping for the type book. We use the data type attachment (the new data type added by the plugin). The attachment type not only indexes the content of the doc in content sub field, but also automatically adds meta data on the attachment as well (when available). You can find the list of all the medatada here. In the example we add a field called file with two metadata: content_type and language.

We can now search for all the documents with the HTML content type and we can see that the result of the query contains the indexed document.

1

2

3

4

5

6

7

8

9

10

11

12

postlibrary/book/_search/

{

"fields":[

"file.content_type",

"file.language"

],

"query":{

"match":{

"file.content_type":"html"

}

}

}

Here you can find the complete documentation about the mapper-attachments plugin from the Elastic website (it covers some topics we did not see in this post, like how to handle with the number of indexed characters and how to highlight attachments content).

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. If you want to know more or withdraw your consent to all or some of the cookies, please refer to the coockie policy. Got it!Reject.