Archive for January 26th, 2009

In my previous two posts, I’ve written about some basic search analytics and then some more advanced analysis you can also apply. In this post, I’ll write about the types of analysis you can and should be doing on data captured about the usage of search results from your search solution. This is largely a topic that could be in the “advanced” analytics topic but for our search solution, it is not built into the search solution and has been implemented only in the last year through some custom work, so it feels different enough (to me) and also has enough details within it that I decided to break it out.

Background

When I first started working on our search solution and dug into the reports and data we had available about search behavior, I found we had things like:

Top searches per reporting period

Top indexes used and the top templates used

Searches per hour (or day) for the reporting period (primarily useful to know how much hardware your solution needs)

A breakdown of which page of results a user (allegedly) found the desired item

and much more. However, I was frustrated by this because it did not give me a very complete picture. We could see the searches people were using – at least the top searches – but we could not get any indication of “success” or what people found useful in search, even. The closest we got from the reports was the last item listed above, which in a typical report might look something like:

Search Results Pages

95% of hits found on results page 1

4% of hits found on results page 2

1% of hits found on results page 3

0% of hits found on results page 4

Users performed searches up to results page 21

However, all this really reflects is the percentage of each page number visited by a searcher – so 95% of users never go beyond page 1 and the engine assumes that means they found what they wanted there. That’s a very bad assumption, obviously.

A Solution to Capture Search Results Usage

I wanted to be able to understand what people were actually clicking on (if anything) when they performed a search! I ended up solving this with a very simple solution (simple once I thought of it). I believe this emulates what Google (and probably many other search engines) do. I built a simple servlet that takes a number of parameters, including a URL (encoded) and the various pieces of data about a search result target and stores an event in a database from those parameters and then forwards the user to the desired URL. Then the search results page was updated to provide the URL for that servlet in the search results instead of the direct URL to the target. That’s been in place for a while now and the data is extremely useful!

By way of explanation, the following are the data elements being captured for each “click” on a search result:

URL of the target

search criteria used for the search

Location of the result (which page of results, which result number)

The relevance of the result

The index that contained the result and whether it was in the ‘best bets’ section

The date / time of the click

This data provides for a lot of insight on behavior. You can guess what someone might be looking for based on understanding the searches they are performing but you can come a lot closer to understanding what they’re really looking for by understanding what they actually accessed. Of course, it’s important to remember that this does not really necessarily equate to the user finding what they are looking for, but may only indicate which result looks most attractive to them, so there is still some uncertainty in understand this.

While I ended up having to do some custom development to achieve this, some search engines will capture this type of data, so you might have access to all of this without any special effort on your part!

Also – I assume that it would be possible to capture a lot of this using a standard web analytics tool as well – I had several discussions with our web analytics vendor about this but had some resource constraints that kept it from getting implemented and also it seemed it would depend in part on the target of the click being instrumented in the right way (having JavaScript in it to capture the event). So any page that did not have that (say a web application whose template could not be modified) or any document (something like a PDF, etc) would likely not be captured correctly.

Understanding Search Usage

Given the type of data described above, here are some of the questions and actions you can take as a search analyst:

You know the most common searches being performed (reported by your search engine) – what are the most common searches for search result clicks?

If you do not end up with basically the same list, that would indicate a problem, for sure!

Action: Understanding any significant differences, though, would be very useful – perhaps there is key content missing in your search (so users don’t have anything useful to click on).

For common searches (really, for whatever subset you want to examine but I’m assuming you have a limited amount of time so I would generally recommend focusing on the most common searches), what are the most commonly clicked on results (by URL)?

Do these match your expectations? Are there URLs you would expect to see but don’t?

Action: As mentioned in the basic analytics article, you can identify items that perhaps are not showing properly in search that should and work on getting them included (or improved if your content is having an identity issue).

Independent of the search terms used, what are the most commonly accessed URLs from search?

For each of the most commonly used URLs, what keywords do users use to find them?

Does the most common URL clicked on change over time? Seasonally? As mentioned in the basic analytics article, you can use this insight to more proactively help users through updates to your navigation.

Action: Items that are common targets but which have a very broad spectrum of keywords that lead a user to it might indicate a landing page that could be split out into more refined targets. That being said, it is very possible that users prefer the common landing page and following the navigation from there instead of diving deeper into the site directly from search. Some usability testing would be appropriate for this type of change.

A very important metric – What is the percentage of “fall outs” (my own term – is there a common one)? Meaning, what percentage of searches that are performed do not result in the user selecting any result? For me, this static provides one of the best pieces of insight you can automatically gather on the quality of results.

More specifically, measure the percentage fall out for specific searches and monitor that. Focus on the most common searches or searches that show up as common over longer durations of time.

Action: Searches that have high fall out would definitely indicate poor-performing searches and you should work to identify the content that should be showing and why it doesn’t. Is the content missing? Does it show poorly?

What percentage of results come from best bets?

Looking at this both as an overall average and also for individual searches or URLs can be useful to track over time.

Action: At the high level (overall average) a move down in this percentage over time would indicate that the Best Bets are likely not being maintained.

Look for items that are commonly clicked on that are not coming from Best Bets and consider if they should be added!

Are the keywords associated with the best bets items kept up to date?

Action: Review the best bets and confirm if there are items that should be added. Also, does your search results UI present the best bets in an obvious way?

What is the percentage of search results usage that comes from each page of results (how many people really click on an item on page 2, page 3, etc.)?

Are there search terms or search targets that show up most commonly not on page 1 of the results?

Action: If there are searches were the percentage of results clicked is higher on pages after page 1, you should review what is showing up on the first page. It would seem that the desired target is not showing up on the first page (at least at a higher rate than for other searches).

Action: If there are URLs where the percentage of times they are clicked on in pages beyond the first page of results is higher than for other URLs, look at those URLs – why are they not showing up higher in the results?

Depending on the structure of the URLs in use within your content, it might also be possible to do some aggregation across URLs to provide insight on search results usage across larger pieces of your site. For example, if you use paths in your URLs you could do aggregation on this data on patterns of the URLs – How many search results are to an item whose URL looks like “http://site.domain.com/path1/path2″.

Assuming you can do this with your data, you can then analyze common keywords used to access a whole area instead of focusing on specific URLs

If your site is dynamic (using query strings) it might be possible to do some aggregation based on the patterns in the query strings of the URLs instead to achieve the same results.

This type of analysis can actually be very useful to find cases where a user is “getting close” to a desired item but they’re not getting the most desirable target because the most desirable target does not show up well in search. (So a user might make their way to the benefits area but might not be directly accessing the particular PDF describing a particular benefit.)

Action: You can then identify items for improvement.

All of the above detailed questions about URLs can be asked about aggregations of URLs, so keep that in mind.

You can also combine data from this source with data from your web analytics solution to do some additional analysis. If you capture the search usage data in your web analytics tool (as I mention above should be possible), doing this type of analysis should be much easier, too!

For URLs commonly clicked on from search results, what percentage of their access is through search?

Action: If a page has a high percentage of its access via search, this identifies a navigation issue to address.

One case I have not yet worked out is a page that is very commonly accessed from search results (high compared to other results) but for which those accesses represent a low percentage of use of that page – do you care? What action (if any) might be driven from this? It seems like from the perspective of search, it’s important but there does not seem to be a navigational issue (users are getting to the target OK for the most part). Any thoughts?

Turning around the above, for commonly accessed pages (as reported by your web analytics tool), what percentage of their access comes via search? In my experience, it’s likely that the percentage via search would be low if the pages themselves are highly used already, but this is good to validate for those pages.

Action: As above, a high percentage of accesses via search would seem to indicate a navigation issue.

You can also use your web analytics package to get a sense of the “fall outs” mentioned above at a high level of detail – using the path functionality of your web analytics package, what percentage of accesses to your search results page have a “next page” where the user leaves the site? What percentage leads to a page that is known to not be a relevant target (in our data, I see a large percentage of users return to the home page, for example – it is possible the user clicked on a result that is the home page, but it seems unlikely).

However, you will likely not have any insight about what the searches were that led to this and not know what the variance is across different searches.

Summing Up

Here’s a wrap (for now) on the types of actionable metrics you might consider for your search program. I’ve covered some basic metrics that just about any search engine should be able to support; then some more complex metrics (requiring combining data from other sources or some kind of processing on the data used for the basic metrics) and in this post, I’ve covered some data and analysis that provides a more comprehensive picture of the overall flow of a user through your search solution.

There are a lot more interesting questions I’ve come up with in the time I’ve had access to the data described above and also with the data that I discussed in my previous two posts, but many of them seem a bit academic and I have not been able to identify possible actions to take based on the insights from them.

Please share your thoughts or, if you would, point me to any other resources you might know of in this area!