ElasticSearch is a scalable and highly available distributed search engine
built upon Lucene . The Hotels Backend squad has bet on it as one of the key pieces for the new backend architecture, named Bellboy. In this architecture ElasticSearch is used as a data service to store hotel offers temporally, making them searchable for the user.

Each time that a user looks for hotels and their offers for a specific city, these offers are indexed into an ElasticSearch cluster. Hereby, all further user requests to search, filter and so on will end up as a query to this cluster.

Having ElasticSearch as a data service allows Bellboy to:

Scale horizontally in a decoupled way to the other pieces of the architecture.

Index any field to make it searchable.

Support almost any type of field. From strings, integers, lists to geo points.

The following sections will explain how ElascticSearch is used by Bellboy to index hotel offers and make them searchable for the user.

Indexing offers to ElasticSearch

Each time that a new user searches for hotel offers for a specific city, a group of services retrieve the hotels set in that city and the offers provided by each partner. As a result of this process, Bellboy indexes these hotels and offers in a denormalization way, aggregating the hotel fields in each offer having as a result a unique type of document. This denormalization is the way to materialize the joins between the hotels and offers in a non-relational database such as ElasticSearch.

An offer is a JSON document composed by a set of regular fields coming from the hotel and the offer plus a set of internals fields, among them the search_id field. This field is a unique Id that identifies a group of documents that are related to a certain search. This field acts as a logic partitioner. Therefore, an ElasticSearch index is composed by many logic partitions, each one belonging to a specific search.

A new user search will trigger many concurrent tasks, each one related to a provider. The offers provided will come packed in batches of N documents, that will be indexed to ElasticSearch. Each batch operation will be routed with the value of the search_id field. This route will help ElasticSearch to store all documents belonging to the same search in the same shard, involving later only one of the shards in the query made by the user.

A search_id partition is composed from several hundred of documents for small entities to several thousand for large entities where different partitions never intersecting. Instead of use multiple shards with a query that seeks a bunch of documents between millions, Bellboy and more specifically ElasticSearch handles queries that can suit in a unique shard and its related node resources such us CPU, memory, and so on.

Having a unique shard queried gives Bellboy an accuracy for the term aggregation result. This term aggregation is used several times within the query to ElasticSearch and it is especially crucial to perform the hotel normalization.

However, the use of the document routing can produce hotspots: unbalanced document distribution that can have as a result shards much heavier than others.

The following picture shows the distribution of documents of each shard placed in a Bellboy ElasticSearch cluster using as many shards as nodes.

As can be seen in the previous picture the distribution is not perfect: one of the shards has roughly 30% less documents than the other shards. To mitigate this issue Bellboy multiplies per two the number of shards per node. The following picture shows the distribution using that
configuration.

In this case, the shards have an almost equal amount of documents. Therefore, the function used to calculate the search_id behaves close to a uniform function avoiding hotspots.

Adding more nodes, incrementing the offers per second

The number of hotel offers indexed per second may vary due to different causes: traffic expected, amount of offers handled by a search, etc. Bellboy takes this into account to configure the number of nodes for the ElasticSearch cluster to meet the throughput requirements.

In this scenario ElasticSearch allows Bellboy to increase the throughput by adding more nodes to the cluster without modifying the general architecture.

The following picture shows the behaviour of the maximum throughput, offers per second, reached by an ElasticSearch cluster composed by two, three and four nodes.

As can be seen in the graphic, the number of offers per second grow almost linearly as the number of nodes increase. All performance tests were executed using c3.2xlarge instance types with ephemeral SSD disks. The instances were distributed across different availability zones. Each hotel offer was a JSON document that weighed around 1 kilobyte, with roughly 30 fields.

Evicting offers from ElasticSearch

Hotel offers have a limited lifespan of 15 minutes. Once this time has been reached these have to be refreshed by the pricing service. It will minimize the chances to publish outdated prices, giving to the user the last offers gathered from the providers when the 15 minutes have expired.

Deletes and updates operations are very inefficient in ElasticSearch; removing the outdated hotel offers allocated in an index through delete operations will impact directly in the resources of the machine.

Bellboy uses a technique called Index per time frame where instead of removing documents one by one, it erases a set of documents at the same time by removing the whole index. The cost of this operation in terms of resource usage is almost negligible. Every time that there is a new user request, it is routed to an index that will last 15 minutes

The following picture shows the indices placed in time order with three different searches. The searches were routed to the proper index when they were created. The index-n-2 does no longer exist and neither do its hotel offers.

Because Bellboy has to make sure that a set of hotel offers belonging to a search is alive for 15 minutes, the indices are kept for 30 minutes. Until the index is not removed automatically, the outdated searches in that index are still alive but will not be used.

From nearest real time to real time

When a request is created it has no hotel offers indexed yet. Before fetching the prices, Bellboy stores the number of minimum offers expected for all partners and hotels. This information will be used to check if the offers expected related to this request have been indexed.

Due to the nature of the ElasticsSearch near real-time behavior, the number of minimum offers stored in Bellboy at the beginning has to be compared against the result of a query to ElasticSearch. When these values match the indexing process has finished and the user will retrieve a consistent output.

Instead of running a query continuously, Bellboy places a specific aggregation to the user query. This ad-hoc aggregation stage will count the offers and carry out the matching between values and complete the request.

However, the number of offers per partner regarding each hotel can be less than expected at the beginning, due to accommodation restrictions or other issues. When this happens, the pricing service that is in charge of fetching the offers will decrease the expected hotel offers value.

Retrieving the offers and their aggregations

ElasticSearch implements a query interface addressed to do filtering and aggregation over the inverted index exposed by Lucene.

A user query becomes an ElasticSearch query composed by a filter stage and an aggregation stage. The filter stage will prune the amount of hotel offers using the search_id field, therefore the aggregation stage will only receive the offers belonging to a certain search. The aggregation stage is composed of many independent sub-aggregation stages that will be in charge of retrieving the list of hotel offers and counting the amount of hotels regarding their characteristics.

ElasticSearch allows Bellboy to build a complex and deep aggregation tree. Each aggregation stage can be seen as a mapping function and its nested aggregation stages as reduced functions, and this can continue indefinitely.

Query performance, keeping the latency predictable

The full query used by Bellboy for each user request is a long JSON document that can use almost 30 different aggregation stages. Despite the query size, ElasticSearch is able to execute it in less than 50 ms using any modern hardware. It is worth mentioning that the logic partition identified by the search_id field can be composed of a few hundred documents until several thousand for large entities.

The write throughput requirements and the amount of reads faced at one time are dependent variables that grow proportionally: for a 10000 write throughput, the amount of requests expected per second might be 10. Therefore, with twice this traffic the values expected will be 20000 writes per second and 20 requests per second.

The next table shows the read latency behavior taking into account different write loads, from 0% to 75%, and the reads per second incrementing proportionally.

With a 25% of the resources used by the indexing process, the read latency in all percentiles is kept below 100 ms. When the resources usage is doubled, the average latencies increment proportionally. Even though the percentiles 50th, 75th, and 90th have an acceptable behavior during the tests, the 99th percentile spikes until 1.5 seconds.

With the aim to keep the read latency below to 0.5 seconds, Bellboy configures the variable bulk threads to half of the CPUs available. This throttles the CPU resources of each ElasticSearch node dedicated to index offers. At the same time, the bulk queue size is configured with a number big enough that allows the allocation of the index operations that are waiting for a CPU time slice.

Placing the ElasticSearch Cluster at AWS

Bellboy places the ElasticSearch Cluster into an Auto Scaling group with a constant number of machines, and with a Load Balancer in front of it as the entry point. The following image shows this configuration:

The amount of nodes used by an ElasticSearch cluster deployed by Bellboy is always configured with an odd number and with the nodes placed in different availability zones. This configuration increases the service availability and helps ElasticSearch with the consensus resolution in case one of the nodes is down.

The Auto Scaling group will keep the number of nodes constant. If one node is down, it will be replaced by a new one that will be launched with the proper ElasticSearch configuration to become a member of the cluster.

During the time that there is one node less in the cluster, its state becomes Yellow – Elasticsearch has allocated all of the primary shards, but some/all of the replicas have not been allocated – but ElasticSearch can operate and Bellboy continues relying on this ElasticSearch cluster. The replicas of the primary shards placed in the node that is down are promoted. When the Auto Scaling group replaces the node with a new one, the shards are rebalanced automatically, placing some of the replicas and primaries in the new node.

Even though this operation might be IO and CPU expensive, the shards belonging to the indices of Bellboy are limitedly impacted. The following graphic shows the spike produced due to a rebalancing process because a new node was added to the Auto Scaling group to replace a broken one.

This ElasticSearch cluster will be reliable until its state becomes Red, meaning that a primary shard cannot be found. When this happens, the ElasticSearch cluster is no longer used and the traffic is redirected to another Bellboy stack.

Conclusion

ElasticSearch is a product that allows the full use of Lucene potential by applying it to several nodes. This allows for horizontal escalation with a higher document throughput using an API JSON interface. However, it is important to know ElasticSearch fundamentals and how they can fit in your product.