Archive for January 10th, 2008

As I previously wrote, in my work on enterprise search, I have found there to be 3 Principles of Enterprise Search: Coverage, Identity and Relevance. My previous posts have discussed the principles of Coverage and Identity.

Here, I will cover the principle of Relevance.

So, in your efforts to improve search for your users, you have addressed the principle of Coverage and you have thousands of potential search candidates in your enterprise search tool. You have addressed the principle of Identity and all of those search results display well in a search results page, clearly identifying what they are so a searcher can confidently know what an item is. Now for the hardest of the three principles to address: Relevance.

The principle of Relevance is all about search results candidates showing up high in search results for appropriate search terms. Relating this back to the original driving question – “Why doesn’t X show up when I search on the search terms Y?”, the principle Relevancy addresses the situation where X is there and may even be listed as X, but it is on the second (or even farther down) page of results.

This principle is in some ways both the hardest and simplest to address. It is hard because it practically requires that you anticipate every searcher’s expectations and that you can practically read their minds (no mean feat!). It’s simple (at least given a search engine) because relevance is also a primary focus for the search engine itself – many search engines differentiate themselves from competitors based on how well the engine can estimate relevance for content objects based on a searcher’s criteria; so your search engine is likely going to help you a lot with regard to relevance.

However, there are still a lot of issues to consider and areas you need to address to help your search engine as well as your users.

One of the first things you should consider is the set of keywords associated with your content. There are several different ways search engines will encounter keywords:

First and foremost, the content of your search items present a set of keywords to most search engines; this is going to be the content visible in a web page or the words in the body of documents.

The keywords accessible in the form of “keywords” <meta> tags in HTML pages, or “keywords” fields in the File Properties of documents in various formats.

The keywords might even be terms in a database that is related in some way to the content that your search engine can use. This is very common for tightly constrained environments that integrate both a content management (or collaboration) environment with a search experience. If the tool controls both the content and the search, it can take advantage of a lot of “insight” that might not be directly available to an enterprise search solution.

Some search engines will even use the text of links pointing to a content item as keywords describing the item. So content managers can influence can influence the relevance of content they don’t manage themselves by how they refer to it.

Lastly, you also need to understand how your search engine will use and interpret these various sources of keywords and focus on those that provide the most impact. Some search engines might ignore the “keywords” <meta> tag for example, so you may not need to be concerned with that at all.

One detail to highlight with regard to the content of your search items is that, just like the navigation challenges discussed in the around Coverage, if you have web sites that depend on JavaScript to display content, then that content likely will be invisible to your search engine, so it will not contribute to the keywords users can use to find the pages. I see this issue as becoming more of a problem in the future as applications are built that take advantage of AJAX to present dynamic user interfaces.

Once you have a strategy for how you will present keywords to your search engine, you need to determine how best to manage the set of keywords that will be most useful to your content managers and to the users of your search tool. A principle tool for this is to have a taxonomy that helps inform your audience about preferred terms. I’ll write more about taxonomies in the future – for now, you should know that a very effective way to improve search is to simply constrain the terms used to tag content to a well-managed set.

A taxonomy can also be used to provided guided navigation or constraint search pick lists. Instead of a simple keyword box for search, you can offer your users lists of values to select. The utility of this will depend on your users’ needs and you need to ensure you pay attention to usability.

Related to taxonomies, you should also consider how best to manage synonyms. This will likely require some work with your taxonomy (to associate synonyms with “preferred” terms); this may require you to manage synonyms for your search engine (to define the mapping between synonyms used by the engine – hopefully, these rings are pulled from your taxonomy!); you might need to institute some means to tag your content with both the preferred terms and with synonyms (especially if you are exposing your content to search engines other than your own – i.e., your content is exposed to internet search engines).

A third issue related to relevancy is the security of content; I relate this to relevance in the sense that if a user does not have access to a particular piece of content exposed by your search, effectively that content has zero relevance for that user. Many web tools (especially collaboration applications) provide users with very powerful management tools to control visibility of content – including even such details as differentiating even between who can know a piece of content exists and being able to download that content. However, interpreting granular searching controls on content is a very hard problem for an enterprise search tool to efficiently solve. In my experience, the most common “solution” for this type of problem is to not index such secure areas for inclusion in your enterprise search but to ensure that the tool provides a “local search’ and then ensure your enterprise search experience points users to this local search function when appropriate.

Lastly for now, another area you should consider in terms of relevance is to monitor your search engine’s log files. Ultimately, I think this effort will transform into one of:

Input to help you manage your taxonomy (by discovering the terms your search users are actually using and understanding how they differ from your taxonomy)

Identification of holes in your content by understanding “not found” results (helping to identify and then solve Coverage issues)

Identification of relevancy issues by understanding when some terms require more page scrolling than others.

Summary

To look back at the 3 Principles: 1) you need to make sure your search engine will find and index the necessary content; 2) you need to make sure your content will properly be identified in search results; and, 3) you need to ensure that your content will show as highly relevant for searches your users expect to show that content.

To address most issues, it does not require any magic or rocket science, but just an awareness of the issues and time and resources (these latter two being scarce for many!) to work on resolving them.