Using Luke to Understand Sitecore 7 Search

Sitecore 7 depends heavily on search. The more you know about how the search engine works, the better. And some of the most important tools you have at your disposal for understanding how your search engine works are those tools that are available for your search engine.

The sorts of tools available to help you develop and debug are different for each one. Even search engines that are closely related, such as Lucene and Solr, have very different tools available.

In this post I cover how to use Luke, a popular tool for Lucene, in order to perform some basic investigation and troubleshooting.

Luke is a free, open source product available at https://code.google.com/p/luke/. Luke can be used with previous versions of Sitecore, but this post is specifically covering Sitecore 7.

Luke is a Java application, so you need to be sure the Java runtime (JRE) is installed on your machine. If the JRE is installed, all you should have to do is double-click the jar file and Luke will start up.

Selecting the Index

The first thing Luke needs to know is the location of the Lucene index files. Each index has its own folder in the index folder. The index folder is specified in the IndexFolder setting in web.config. By default this is a folder named "indexes" located in the Sitecore data folder.

For this example, I want to use the index for the master database. In my environment, this index is located at Z:\Sitecore\Data\indexes\sitecore_master_index.

Enter the path to the index files.

Click OK.

Luke will display information about the index. You might need to enlarge the window in order to see all of the information properly (an example of see the area in the red box below).

Viewing Fields

On the bottom-left of the screen Luke displays a list of fields. These are the fields that can be searched for using the Sitecore content search API. The fields that are available depend on your Sitecore configuration.

You can see the top terms that have been indexed for a specific field.

Click the field on the left.

Click the button "Show top terms >>".

Viewing Documents

If you right-click the term you can get to a couple of screens that allow you to view the matching documents. "Browse term documents" takes you to the Documents tab, with the "Browse by term" section already populated with the term and value you selected.

Click "First Doc" in order to see all of the fields for the first document in the index.

Performing Searches

On the Documents tab, if you click "Show All Docs" you will be taken to the Search tab. A search expression is already populated with the appropriate Lucene search syntax, and the search results appear below.

Here are some tips that might be helpful for first-time users:

You can double-click any row in the results in order to see the fields for the selected document.

You can change the search expression and click the Search button in order to test Lucene searches. Just remember that the Sitecore content search API automatically converts LINQ expressions into Lucene search syntax.

Using Luke to Understand Sitecore

The main reason I wrote this post is because Luke is a tool that can help you better understand how the Sitecore 7 content search API works, especially how some of the different indexing-related settings work.

I can use Luke to see what happens when new fields are added to a template. Before I add the field, however, I want to use Luke to see what is already being indexed.

Navigate to the Search tab.

Enter "sample item" for the search expression. Be sure to include the double-quotes so that term is treated as a single term (and not split into separate terms where the space is).

Select _templatename from the field list.

Click the Search button.

Double-click the row that corresponds to the English-language item with the _group value 110D559F-DEA542EA9C1C8A5DF7E70EF9 (the _group field is the item ID).

I can see the fields that have been indexed for the Home item:

__smallcreateddate

__smallupdateddate

__thumbnail

_creator

_database

_datasource

_fullpath

_group

_id

_indexname

_language

_latestversion

_name

_parent

_path

In Sitecore, add the following field to the "Sample Item" template:

Name: AAA

Type: Single-Line Text

Add the following field to the "Sample Item" template:

Name: BBB

Type: Single-Line Text

In the Home item, set the following value:

Field: AAA

Value: valuea

In the Home item, set the following value:

Field: BBB

Value: valueb

In Luke, navigate to the Overview tab.

Click the "Re-open" button.

Scroll down in the list of fields and select "aaa".

Click "Show top terms >>".

Double-click the row in the Top ranking terms section.

Click First Doc.

The list of fields displayed here are the values that are available to the "hydrate" process. (That's the process that automatically populates POCO. The Sitecore 7 dev team has posted an article about using POCOs.)

But if you look in here, you'll notice that the fields AAA and BBB are missing. This may seem counterintuitive: the field and its value is clearly available in the index, but the value is not displayed.

Before I explain how you can fix this, I need to explain how the AAA and BBB fields are indexed in the first place.

If you look in the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config file, you will find the following line. This line is responsible for the fields AAA and BBB being indexed.

This line is also responsible for the AAA and BBB fields not being displayed in Luke. This is because of the "storeType" setting in the config files. This setting ensures Lucene indexes the value, but Lucene doesn't store the value so it is available to the hydrate process.

The reasons for and against storing values is a topic for another blog. I don't want to change the settings so every Single-Line Text field is stored. But I do want to change the settings so the AAA field is stored. To do that I need to make a change to the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config file.

Add the following line to /configuration/sitecore/contentSearch/configuration/DefaultIndexConfiguration/fieldMap/fieldNames

Conclusion

Hopefully this post gets you interested in using Luke not only to understand what is a Lucene index, but also to understand how you can troubleshoot the hydrate process. And be sure to add any other Luke-related tips in the comments!

Comments

You can also use Luke to examine Solr indexes (look in <solr root>/<corename>/data).
The problem is if you are using a newer version (Solr 4.*) then the standard Luke is not compatible with Lucene 4.1 / 4.2 and 4.3 index versions. There is a project here : https://github.com/tarzanek/luke that has forked luke to fix these issues but you need to compile it yourself. A binary that supports Solr 4.2 can be found here : http://www.semanticmetadata.net/2013/04/11/luke-4-2-binaries/
Luke is actually built into Solr and can be used to analyse fields directly from the Solr admin pages using the schema browser.

-
Stephen Pope
June 25, 2013 at 10:25 PM

Is it possible to check computed index fields by Luke? I actually failed to find them in Luke.

-
Ivan Buzyka
July 05, 2013 at 4:42 AM

Just figured out that is possible to do (see computed fields in Luke). So, disregard my question :)

-
Ivan Buzyka
July 09, 2013 at 1:50 AM

Can you configure a fieldMap using the field id instead of name ?
Also where are these configuration values documented ?

-
Mark Kidd
October 27, 2013 at 2:05 PM

My Sitecore 7 (130424) search page throws an
"Could not create instance of type: Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer. No matching constructor was found."
exception since I ran Luke 3.5.0 (https://code.google.com/p/luke/downloads/detail?name=lukeall-3.5.0.jar) on the same machine the site is running on.
I have not solved the issue yet, I'd appreciate any suggestions.
Thanks,
M