Securing Your Elasticsearch Cluster

February 6, 2019: Looking for the best way secure your Elasticsearch data? Spin up a cluster on our Elasticsearch Service or check out our subscriptions for your existing deployment. Both options enable security features like encrypted communication, role-based access control, authentication realms (native, LDAP, Active Directory, etc.), single-sign on, and more. Please note that the following article was authored before this security functionality was available.

UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as the Elasticsearch Service.

A Brief Overview of Running Elasticsearch Securely

Elasticsearch does not perform authentication or authorization, leaving that as an exercise for the developer. This article gives an overview of things to keep in mind when you configure the security settings for your Elasticsearch cluster, providing users with (limited) access to your cluster when you cannot necessarily (entirely) trust them.

As an Elasticsearch provider, security is of paramount importance to Found. We need to protect our users from others with possibly nefarious intents, in addition to protecting users from causing trouble for themselves.

This article builds upon Elasticsearch in production, where we introduced many of the security related topics discussed here. We expand on these in this article and elaborate on some areas to keep in mind when actually implementing them.

Essentially, you need to carefully scrutinize requests you send to Elasticsearch; just like any other database. Unlike most other databases, however, Elasticsearch has a feature allowing arbitrary code execution. That poses some interesting challenges!

We'll look at nuances in the various levels of trust you can give your users, and the risks imposed; from arbitrary requests with full access on one end, to just parameterized pre-defined requests on the other end.

Elasticsearch has no concept of a user. Essentially, anyone that can send arbitrary requests to your cluster is a “super user”.

If you are used to systems like PostgreSQL, where you limit access to databases tables, functions, etc. with high granularity, you might be trying to find a way to limit access to certain operations and/or certain indexes. At the moment, Elasticsearch does not consider that to be its job. Elasticsearch has no concept of a user. Essentially, anyone that can send arbitrary requests to your cluster is a “super user”. It's a reasonable limitation to impose. There are so many ways to implement various authentication and authorization schemes, and many of them are closely coupled to the application domain.

A lot of the advice here applies to search engines and databases other than Elasticsearch as well. We do not mean to imply that these are inherently insecure, or criticize their choices. It is a perfectly reasonable decision to leave security to the user. However, we want to raise awareness about Elasticsearch and security related aspects.

Prevent usage of the script feature for execution of arbitrary code. If we are unable to do so, anything else is merely “security through obscurity” that can be bypassed. Disabling dynamic scripts introduces some challenges on its own, though.

Limit who can access what: both for searching and for indexing. This can be achieved to some extent with a proxy layer.

Preventing requests that can overwhelm the cluster and cause a denial of service. This is hard to completely prevent if arbitrary search requests are allowed.

We'll also see how these things apply even when running Elasticsearch locally for development purposes.

As an example of different levels of trust, assume you have multiple installs of the same CMS for different customers. You somewhat trust the CMS not to do crazy things, but it's still a good idea to require authentication and have separate indexes, just in case. Therefore, you can permit the CMS access to do arbitrary requests to its allowed indexes. The CMS, however, is exposed to the world. It does not accept any arbitrary requests. It translates search parameters into the proper Elasticsearch request, and sends it to the appropriate indexes.

There is nothing preventing a script from sending a second request back to Elasticsearch - thus evading any URL-based access control - or from doing anything the Elasticsearch has access to. As a result, dynamic scripts must be disabled if you cannot entirely trust your user.

As such, we advise against having dynamic scripts enabled and attempting to blacklist or sanitize scripts. It is very difficult to reason that a script is not doing bad things when its execution is not sandboxed. Flash and Java applet's history of security problems attest to how difficult it is to create a sandbox without vulnerabilities. Also, it is provably impossible to reason whether a script will terminate or spin and cause denial of service.

We have emphasized the importance of disabling dynamic scripts. They are important for lots of things, so we need to ensure that we still achieve those things. Therefore, we'll look at some examples of using preloaded scripts that can take parameters at search time.

Then, we can specify e.g. {"script": "scoring_recency_boost", "params": {"now": 1386176910000}}. The script-parameter is the path to the script relative to config/scripts, with _ as path separator. Note that the prefix is scoring_ and not scoring/.

Using preloaded scripts offers another benefit: your script definitions are specified in one place, and not scattered around in the various applications that use your Elasticsearch cluster. This helps increase the maintainability of your search applications and profiles. You can change and improve your scripts without having to change every client using the scripts.

Note: This section assumes using the HTTP-API. Currently, there is no way to easily restrict what a transport client can do.

Elasticsearch has many ways of specifying what indexes to search across, or index to. If you have different users on the same shared cluster and let them send arbitrary search requests (though without scripts), you may also want to restrict what indexes they can access.

Typically, the indexes are specified in the URL of the request, i.e. index_pattern/type_pattern/_search. However, there are also APIs like multi-search, multi-get and bulk that can take index as a parameter in the request body as well, thus overriding what indexes get searched or where documents get indexed. In 0.90.4, the configuration option allow_explicit_index was introduced that lets you forbid these overrides.

Note that the index is actually an index pattern and not necessarily an index name. Thus, if you are prefixing your indexes with something user specific, you must consider index patterns as well. For example, just doing index_name = "user123_" + user_specified_index would not work very well if user_specified_index = ",*". The request would end up as a request to user123_,*/_search, and the search would run on every index.

With disable_dynamic_scripts set to true and allow_explicit_index set to false, you can be certain that requests sent to _search- and _msearch/_mget-endpoints can only touch the explicitly allowed indexes, and similarly for indexing requests to _bulk. This makes it possible to let a proxy layer limit what indexes the forwarded requests can touch.

To also restrict what documents can be operated on, you can use a filtered alias. Any search, count, more like this and delete by query requests will have the filter applied. If you rely on these, make sure the underlying indexes cannot be accessed.

Having restricted what indexes and endpoints your users can send requests to, you must also consider what methods to allow. You probably don't want to allow anyone access to DELETE an index. Since it's a good practice to have idempotent requests, it might be a good idea to disallow requests directly to the index anyway, just allowing POST-ing to endpoints like _search or _bulk or PUT-ing and DELETE-ing specific documents.

While not as harmful as exposing data, requests that can crash your cluster or severely impact its performance must be avoided as well. Unfortunately, avoiding them is not as easy as flipping a configuration variable.

There are many things that can consume a lot of memory in Elasticsearch. While not an exhaustive list, these are some examples:

Field caches for fields to facet, sort and script on.

Filter caches.

Segments pending flushing.

Index metadata.

Loading a field that is or has grown too large is probably the most common cause of running out of memory. Two important improvements are coming in Elasticsearch 1.0 to better deal with this:

The first is document values. By enabling these in your mapping, Elasticsearch will write the document's values in a way that makes it possible to rely on the operating system's page cache to efficiently use the values. This can massively reduce the amount of memory required for the heap space, although this approach can be a bit slower as well.

The second improvement, while not commited to master at the time of writing, is a circuit breaker. Its purpose is to impose a limit on how much memory can be used to load the field, breaking if the limit is exceeded. It defaults to being disabled, but with a sensible limit, requests attempting to load too much will break with a CircuitBreakingException, which is a lot safer than an OutOfMemory error!

Both require a bit of tweaking and planning ahead. While these help a lot in terms memory utilization, there is still a heavy performance impact when miss-loading a huge field. Other fields that are really needed can be expunged, forcing them to be loaded again.

If you allow indexing arbitrarily structured documents, you probably want to disable dynamic mapping. While Elasticsearch is often described as a schemaless database, Elasticsearch implicitly creates a schema. This works well when developing, but should probably be off in production. For example, it can cause problems if values appear as keys in your object:

Say you have an object like {"access": [ {"123": "read"}, {"124": "write"} ]}. While seemingly innocuous, this will cause an entry in the mapping for every ID. With thousands or millions of keys like this, the size of the mapping will explode, as there will be an entry per key. The mapping is also part of the cluster state, which is replicated to every node. Having values as keys can work well with document oriented databases that have no concept of a schema and treat documents as blobs. With Elasticsearch, however, you should never have values in your keys. Instead, this example could be {"access": [ {"user_id": 123, "level": "read"}, {"user_id": 124: "level": "write"} ] }, with access as a nested type.

In short, while it is possible to restrict which indexes can be searched or indexed to with a simple proxy layer, it is not possible to pass through arbitrary requests without risking the stability and performance of the cluster. This is the case for just about any database or search engine and should not be much of a surprise. We see this approach a lot, though. Kibana does it with great success. For Kibana it makes a lot of sense, as it's largely an Elasticsearch dashboard. But if you copy its usage patterns and re-implement them in your end user facing applications, you also bring the risks mentioned here.

Elasticsearch is typically used through HTTP, binding to localhost. Intuitively, external hosts cannot connect to something listening to localhost or that is protected by a company firewall. However, your web browser can reach your localhost, and might be able to reach servers on your company's internal network.

Any website you visit can send requests to your local Elasticsearch node. Your browser will happily do an HTTP-request to 127.0.0.1/_search. Consequently, any website can go spelunking in whatever data is in your locally running Elasticsearch. It could then POST its findings somewhere. Adjusting the settings for cross-origin resource sharing can help a bit, although it would still be possible to search using JSONP-requests.

Our warnings about dynamic scripts apply here as well. You certainly don't want any website out there to be able to run code on your machine!

We recommend running Elasticsearch in a virtual machine while developing on the same machine you use to surf the web. Don't have sensitive data locally, and keep dynamic scripts disabled.

Restricting access to indexes and adding authentication and SSL can be done with numerous tools. Implementing it is outside the scope of this article. Nginx is quite popular for this. Additionally, there are various Elasticsearch plugins that attempt to add things like basic auth. At Found we provide this as part of our proxy layer that routes requests. You can configure ACLs that implement HTTP basic auth, SSL, and restrict what methods and paths can be accessed. Additionally, these rules can be combined to fit nearly any situation. Most Elasticsearch clients now support HTTP basic auth, SSL, and other goodies, including the official clients; so there is no excuse to not use them.

Even though Elasticsearch is multi-tenant and can happily serve many different users and applications on the same cluster, at some point you might want to create multiple clusters to partition resources and provide for additional security. Essentially, you want to reduce the impact area of the problems and risks described in the section on [Preventing denial of service]. For example, if you have a huge spike in traffic, the increase in logging throughput (which you use Logstash and Kibana to capture an analyze, right?) should not impact the performance of more important applications.

Today, it's easier than ever to use technologies like LXC and Docker to isolate processes, and constrain resource usage like disk space, memory and CPU. That is exactly what we do at Found. Customer clusters are completely isolated and resources are dedicated, not overprovisioned. This is very important, without these practices we would not be able to guarantee an acceptable level of security and performance would be unreliable.

This article has covered a lot of ground, but be sure to keep in mind the following.

Disable dynamic scripts. They are dangerous.

Understand the sometimes tricky configuration is required to limit access controls to indexes.

Consider the performance implications of multiple tenants, a weakness or a bad query in one can bring down an entire cluster!

Cool stuff is continuously trickling into Elasticsearch, which is being developed at a mind-blowing pace. Some of these improvements will make life easier when dealing with a few of the challenges mentioned here. Others may introduce new challenges.

You should always assume that security is left to you. Remember, security is an onion, and good strategies have multiple layers. Don't let Elasticsearch be a weak layer in your application's security onion!