Azure Search: the Hard Truths of Indexing Limits with Sitecore

Azure Search: the Hard Truths of Indexing Limits with Sitecore

Sitecore has been all-in on native Azure PaaS support since Sitecore 9.0 was released in October of 2017. Unfortunately one major piece doesn't fully come together with the rest: Azure Search.

When it comes to search providers, Sitecore recommends Solr out of the box. Sitecore also supports Azure Search as a provider. Unfortunately, Sitecore kind of dances around the challenges and limitations presented by Azure Search.

Hard Truths and Hard Limits

While there are a few smaller quirks, the biggest issue by far with Azure Search is the 1,000 field limit per index. This is an Azure Search limitation - not a Sitecore issue - and most greatly affects the master and web database indexes (since they have the largest number of items).

Out of the box, Sitecore's master index already contains roughly 550 fields. This severely limits the amount of new fields that can be added. A moderately-sized website can probably consume those remaining 450 fields, thus hitting the Azure Search limit. A true multi-tenant and/or multi-lingual site would almost certainly be DOA for Azure Search.

What Sitecore Recommends

There is documentation on Azure Search limitations, and it should be required reading for all Sitecore developers, administrators, and architects (and quite frankly, sales folks). The 1,000 field limit is not a trivial issue and should be understood prior to selecting a search provider.

There is no silver-bullet solution for getting Azure Search to work for all scenarios, but there is a couple of ways to ease the pain.

Splitting Indexes

First, Sitecore recommends splitting large indexes into smaller, custom indexes. For example, in a multi-tenant environment, each tenant should have their own search index. This works fine for the Content Delivery side of things (where you will likely have separate web databases anyhow), but doesn't address the master index, which will always contain the aggregate of all fields in a Sitecore instance.

Excluding Fields

Second, Sitecore recommends excluding common fields from the master search indexes, and has even provided a config file that does this for you:

This is a step in the right direction, especially since we can lean on Sitecore to provide a vetted list of fields that are safe to not index. Unfortunately it's just a band-aid over the larger problem.

Disable Indexing of All Fields

Finally, Sitecore also recommends setting indexAllFields to false. By doing this, Sitecore greatly reduced the number of fields that get indexed by default. This setting can be found in the following configuration file:

Here's an interesting tidbit: changing a setting called "index all fields" to falsedoesn't actually disable the indexing of all search fields. Sitecore still indexes and uses a minimum number of required fields that keep the CMS operating.

For example, content authors will always be able to use search in the Sitecore Client to find items by item name, but not necessarily by content in those items.