Introducing: Elasticsearch with Azure File storage

The Azure Quickstart Templates are a great resource for getting started with template based deployments for many different technologies, including Elasticsearch. Just recently we made some improvements to the Elasticsearch template that enable you to create a pre-configured Elasticsearch cluster which stores data on Azure File storage, and provides you with the option of installing plugins like Sense, Marvel and Kibana, all in just a few minutes.

If you are not familiar with Azure Files, this service offers shared storage using the SMB protocol via mounted shares. This storage is accessible in the same region to any number of virtual machines or roles, and is supported in both Windows and Linux. For more information, take a look at this introduction to Azure File storage.

Since mounting and un-mounting shares is quick and can be done while the system is live, Azure Files give us a way to easily decouple compute and storage, which opens up a lot of exciting possibilities with a technology like Elasticsearch. In particular, shadow replicas indices allow us to make use of this shared filesystem with several attractive optimizations:

Only a single copy of the data is indexed; replicas are replaced with shadow replicas

Recovery consists of another node taking ownership of a shard, rather than creating another copy

Rebalancing or redistributing data across more or fewer nodes becomes a lightweight operation

Trying out this new functionality is trivial with the improved Azure Resource Manager template. In this article we will go through a simple example to get you up and running with Elasticsearch on Azure File storage.

Deployment

After selecting Deploy to Azure in the Elasticsearch template, you will need to provide some parameters to the deployment. If unsure, use the default values to get started.

Some of the parameters are explained in more detail below.

Operating system: Azure File support in the template is only in Ubuntu today, but will exist for Windows soon

Resource group: A logical grouping within your subscription for a collection of related resources; typically, you provide a memorable name here

Jumpbox: Provides an entry point into the cluster via ssh (Ubuntu) or rdp (Windows); use this unless you are deploying to an ExpressRoute subscription

Node sizes: The default sizes are good options for getting started, but make sure you deploy enough cores for your workload

Version: It is recommended you use the latest version available (currently 2.2.0) for Azure File support

AFS: Select this option for Azure File storage; Note, this option is currently only valid for Ubuntu and Elasticsearch 2.x

Template base: When deploying from the Azure repo, this does not need to be changed; if deploying from your own fork, update this to your repo URI

Kibana and Sense: Selecting both of these provides an easy way to access and interact with your cluster via a public IP

Accessing your cluster

Once the deployment is complete, you can find the Kibana URL and Jumpbox IP in the deployment outputs for the specified resource group in the Azure portal. Select the resource group, then the Last deployment link under Essentials to find the Kibana and Jumpbox deployment outputs.

Simply use the right-hand buttons to copy the Kibana URL to the clipboard and paste it into a browser window to bring up Kibana. Provided the Sense option was selected at deployment time, you will be able to switch to the Sense app via the Kibana interface.

In order to execute Sense commands, the correct IP will need to be set. The master nodes use static IPs, so replace localhost with 10.0.0.10, or any other private IP from the cluster, which can be found under the list of resources in the portal.

Creating a shadow replica index

When using the AFS option in the template, the elasticsearch.yml settings contain the following:

This means data on local storage is stored in the default location, and shared filesystem data should be stored under the path /datadisks/esdata00. When creating a shadow replica index therefore, you should use index settings like this:

With these settings, the index will be created in the correct location (under the shared data path), and the index will have shadow replicas. The setting to recover a shard on any node is important, as it instructs Elasticsearch to not wait for a node to rejoin the cluster before recovering its shards from the shared filesystem. You can find more information about shadow replicas in the Elasticsearch documentation.

Next steps

In this article you saw how to quickly and easily deploy a fully pre-configured Elasticsearch cluster, including optional tools like Marvel, Sense and Kibana. You also learned how to access this cluster and create a shadow replica based index on the shared storage provided by Azure Files. In order to move from the Development phase into Production deployments, you might be interested in the following topics.

In addition, we are working closely with the Elastic team on providing the tools and guidance for Elasticsearch at scale on Azure File storage. We will be publishing more information about this in the near future, so stay tuned!