Indexing content from Drupal 8 using Elasticsearch

You Might Also Like

Last week, a client asked me to investigate the state of the Elasticsearch support in Drupal 8. They're using a decoupled architecture and wanted to know how—using only core and contrib modules—Drupal data could be exposed to Elasticsearch. Elasticsearch would then index that data and make it available to the site's presentation layer via the Elasticsearch Search API.

During my research, I was impressed by the results. Thanks to Typed Data API plus a couple of contributed modules, an administrator can browse the structure of the content in Drupal and select what and how it should be indexed by Elasticsearch. All of this can be done using Drupal's admin interface.

In this article, we will take a vanilla Drupal 8 installation and configure it so that Elasticsearch receives any content changes. Let’s get started!

Downloading and starting Elasticsearch

We will begin by downloading and starting Elasticsearch 5, which is the latest stable release. Open https://www.elastic.co/downloads/elasticsearch and follow the installation instructions. Once you start the process, open your browser and enter http://127.0.0.1:9200. You should see something like the following screenshot:

Now let’s setup our Drupal site so it can talk to Elasticsearch.

Setting up Search API

At the time of this writing there is no available release for Elasticsearch Connector, so you will have to clone the repository and checkout the 8.x-5.x branch and follow the installation instructions. As for Search API, just download and install the latest stable version.

Connecting Drupal to Elasticsearch

Next, let’s connect Drupal to the Elasticsearch server that we configured in the previous section. Navigate to Configuration > Search and Metadata > Elasticsearch Connector and then fill out the form to add a cluster:

Click 'Save' and check that the connection to the server was successful:

That’s it for Elasticsearch Connector. The rest of the configuration will be done using the Search API module.

Configuring a search index

Search API provides an abstraction layer that allows Drupal to push content changes to different servers, whether that's Elasticsearch, Apache Solr, or any other provider that has a Search API compatible module. Within each server, search API can create indexes, which are like buckets where you can push data that can be searched in different ways. Here is a drawing to illustrate the setup:

Now navigate to Configuration > Search and Metadata > Search API and click on Add server:

Fill out the form to let Search API manage the Elasticsearch server:

Click Save, then check that the connection was successful:

Next, we will create an index in the Elasticsearch server where we will specify that we want to push all of the content in Drupal. Go back to Configuration > Search and Metadata > Search API and click on Add index:

Fill out the form to create an index where content will be pushed by Drupal:

Click Save and verify that the index creation was successful:

Verify the index creation at the Elasticsearch server by opening http://127.0.0.1:9200/_cat/indices?v in a new browser tab:

That’s it! We will now test whether Drupal can properly update Elasticsearch when the index should reflect content changes.

Indexing content

Create a node and then run cron. Verify that the node has been pushed to Elasticsearch by opening the URL http://127.0.0.1:9200/elasticsearch_index_draco_elastic_index/_search, where elasticsearch_index_draco_elastic_index is obtained from the above screenshot:

Success! The node has been pushed but only it’s identifier is there. We need to select which fields we want to push to Elasticsearch via the Search API interface at Configuration > Search and Metadata > Search API > Our Elasticsearch index > Fields:

Click on Add fields and select the fields that you want to push to Elasticsearch:

Add the fields and click Save. This time we will use Drush to reset the index and index the content again:

After reloading http://127.0.0.1:9200/elasticsearch_index_draco_elastic_index/_search, we can see the added(s) field(s):

Processing the data prior to indexing it

This is the extra ball: Search API provides a list of processors that will alter the data to be indexed to Elasticsearch. Things like transliteration, filtering out unpublished content, or case insensitive searching, are available via the web interface. Here is the list, which you can find by clicking Processors when you are viewing the server at Search API :

When you need more, extend from the APIs

Now that you have an Elasticsearch engine, it’s time to start hooking it up with your front-end applications. We have seen that the web interface of the Search API module saves a ton of development time, but if you ever need to go the extra mile, there are hooks, events, and plugins that you can use in order to fit your requirements. A good place to start is the Search API’s project homepage. Happy searching!