Allegro OpenSource: Elasticsearch reindex tool

At Allegro we use many open-source tools that support our work. Sometimes we are not able to find what we want and
this is a perfect moment to fill the gap and to share with the community. We are proud to announce an
initial release of Elasticsearch reindex tool — a tool that
provides an easy way to rebuild indexes in elasticsearch.

Background and motivation

I work as a software engineer in a team which delivers search engines in the company. We were faced with a problem
of reindexing elasticsearch indices many times. We needed to shorten its time. After investigating opensource tools
for this job, we decided to build our own tool.

Our idea was to speed up index rebuilding. To decrease the time of reindexing, our tool reads data from the old index
and writes it to the new one in parallel using multiple threads. To make it possible, each thread reads a piece of data
from the index based on a selected field and its values. Currently the tool supports double type and string type fields.
For double type field, queries are spread into segments with a given list of thresholds, for string type fields — with
given prefixes list. Time that can be saved can vary depending on the topology of the elasticsearch cluster and the
index. In our most frequently changed index we decreased reindexing time from 45 minutes to 17 minutes.