Plugins Alongside the X-Pack Security Plugin for Elasticsearch

The Security plugin is designed to work alongside other plugins, offering protection in their presence so long as these plugins do not violate certain contracts. Currently, there is no “stable plugin API” in Elasticsearch, so you should carefully consider whether a custom plugin is really needed before implementing your own. Many new features have been added to the Elastic Stack over the past several years such that many features from plugins are now core Elastic Stack functionality.

This post is geared towards the plugin author and explains the best practices while indulging in code references. It outlines classes and functions which are accurate at the time of this writing, but if you’re reading this far into the future, there is a good chance things will have moved around and/or been renamed.

Also note that having the Security plugin installed and enabled does not mean invalid access patterns or incorrect behavior by other plugins will be detected and denied. For example, a badly designed or mischievously crafted plugin can wreck the whole cluster (e.g., a DiscoveryPlugin preventing cluster formation), regardless of whether the Security plugin is enabled or not.

Any plugin working alongside the Security plugin should use the established authentication and authorization framework when adding new custom actions. To learn about the security features check the X-Pack security features guide. The gist is that users have roles that grant them privileges, which in turn will test for match during node transport action authorization. The transport action, running on a node, is the most fine-grained operation which the authorization framework grants or denies. User privileges are, in effect, a set of granted transport actions. Other features of the Security plugin, namely inter-node secure communication (TLS), HTTPS and auditing, should not be of any concern for the neighbouring plugins on the same node. Consequently, implementing most plugin interfaces, that is IngestPlugin, ClusterPlugin, DiscoveryPlugin, MapperPlugin, SearchPlugin, NetworkPlugin and ExtensiblePlugin, will not overlap with any of the security components that are needed to secure Elasticsearch. That is because these plugin types define callbacks which the Elasticsearch core uses to alter functionality that is already embedded in the Security framework.

One important thing to get right is the naming and delegating execution of transport actions. Naming is important to match actions to privileges. Therefore, most of the following details are geared towards implementers of the ActionPlugin interface. Nevertheless some advice herein applies to general plugin development.

Here are some general aspects to keep in mind while reading through:

TransportAction is the point where execution starts for custom requests. Rest handlers should delegate to a transport action using the provided client (as a handler argument).

TransportAction should be simple. Complex actions should be divided by creating child actions or by submitting request for other peer actions. Granular actions are easier to name and play well with authorization. Do one thing and do it well.

Do not splice authentication on top of your actions. Authentication is a distinct functionality. The authentication process can be extended using a CustomRealm.

Do not manage users and roles. This is an important functionality that the Security plugin handles.

Do not listen on raw sockets. This will open backdoors. To communicate between nodes use the TransportService and the ClusterService. To open actions to clients implement the ActionPlugin interface, create and then register your rest handlers and transport actions.

While reading through you can skim over code references. These are laid out to serve as guidance in the browsing of the actual code. Open source rocks!

The correct way to register new REST actions is to implement the RestHandler interface and return it in the ActionPlugin#getRestHandlers method of your plugin descriptor. If you do this, you get authentication without further ado: HTTP(S) requests, with the standard Authorization HTTP Header, or HTTPS requests, using client certificates, will be authenticated by consulting the configured realms. The ThreadContext is a container for all thread local variables, including the authentication token. It is possible, but discouraged, to read the authenticated user (aka the principal). The piece of code where authentication is available earliest in the request handling is BaseRestHandler#prepareRequest. Moreover, the authentication is carried through, conveyed by the ThreadContext, across handlers executing on other threads or even on other cluster nodes. The ThreadContext is relayed across threads only if you use ThreadPool#executors to get an executor service to run your threaded code. It is also relayed across nodes, provided you use TransportService#sendRequest to submit requests messages. In this case, the TransportRequestHandler, running on the receiving node, will have the ThreadContext of the submitting node automatically restored before calling the TransportRequestHandler#messageReceived handler.
Using ThreadPool#executors and TransportService#sendRequest, is best practice for any plugin type, even when not running with security enabled.

It should go without saying that the existing ThreadContext keys should not be touched. This request is imperative for keys starting with '_'. If you need to clear the ThreadContext, you also have to restore it. Use ThreadContext#stashContext inside a try/with block.

Authentication does not achieve much without authorization. The Security plugin defines privileges, bounded to the principal via roles. Privileges restrict the transport actions allowed to be executed. Specifically, all TransportAction are registered to the TransportService by their name string. The name is used to locate the handler for a transport request. Usually this name is defined in the accompanying Action inside the ActionHandler returned from ActionModule#getActions (for more details check the code in ActionModule). The name has to be unique and descriptive of the true logic of the operation. This axiom is particularly important in the authorization context because privileges are translated to Lucene automatons which match on the names of the actions they grant. Consequently, the action name should have a specific hierarchical format, namespaced with similar other action names. The naming convention is as follows.

An action operating against an index’s data or settings, but not on index templates, should be named starting with the 'indices:' prefix. For example, ‘indices:data/read/get’, ‘indices:data/write/index’, ‘indices:admin/get’ and ‘indices:admin/mapping/put’ to get documents, index documents, get index settings and put index mapping respectively. An action against index templates should start with ’indices:admin/template/'. For example, ‘indices:admin/template/put’ to put an index template. All other actions should be named starting with 'cluster:'. Examples include: ‘cluster:monitor/allocation/explain’, ‘cluster:admin/repository/put’, ‘cluster:admin/reroute’ and ‘indices:admin/shards/search_shards’ for … see you already know from the name what the action does! Action names should not* start with 'internal:' as these actions are reserved for internal use by Elasticsearch. The prefix should be followed by alphanumeric characters separated by forward slashes. Forward slashes create a hierarchical namespace. Lastly, child actions, should have a tag name, affixed to the action name of the parent, inside square brackets (e.g., 's’, ‘p’, ‘r’, ‘n' for shard, primary, replica or node child action names respectively). You can follow this naming convention as a best practice. The [indices|cluster]:[admin|monitor|data]/<your_plugin_namespace>/[read|write|mappings|nodes]/.../... recommended syntax for action naming is just a collation of existing names. For concrete examples check any subclass of Action. You can also take a look at the ActionNamesIT integration test that validates all the registered action names for a bare cluster node.

Role definitions contain privileges. A privilege is just a string prefix of all its encompassing action names. This takes advantage of the functional hierarchy of actions. In other words, a privilege grants all actions registered by a name with the string privilege as a prefix. Therefore, to authorize actions contributed by your plugin, you should specify prefixes (e.g., 'cluster:monitor/plugin_name') in the role definition. All actions with a name containing the prefix will be authorized to any authenticated user that has the role.

The authorization process is conducted automagically for each transport action (child or parent) on each node. The only requirement, which again is a general best practice, is to keep all your logic inside the TransportActions and to use the NodeClient parameter (or any of it's decorators) that you receive in BaseRestHandler#prepareRequest when calling TransportActions. Don't code logic in the prepareRequest, or more generally on the handleRequest, and don’t invoke TransportAction#doExecute directly. Doing this will subvert the authorization framework. The ElasticsearchClient#execute implementation in NodeClient makes sure you are invoking a registered action that the authenticated user (principal) has privileges to execute (is authorized).

Lastly, here are some technical bits to take home:

Do not override ActionPlugin#getRestHandlerWrapper. Doing this will conflict with the way security protects the REST layer. Elasticsearch will complain and refuse to start.

The ThreadContext encompasses all thread local variables, including the authentication token. It is handled automatically.

If you need to tamper with the ThreadContext use ThreadContext#newStoredContext (or ThreadContext#newRestorableContext) inside a try/with resource. This will make sure your changes are contained and visible only to the code inside the try/with block. Don’t use ThreadContext#stashContext outside of a try/with block as it will wipe out the context.

Do not touch the content or the settings of the .security-* index or the .security alias. This is used to store all the Security plugin’s entities, such as users, roles, role mappings, etc.

Rest handlers should delegate all requests to NodeClient#execute which will be handled locally by the TransportAction#doExecute method after the Security framework did its job.