Dan McCormick

May 15, 2018

3 minute read

Getting your customers to the right search results within your site does not happen by chance. If you are looking to improve your website search performance, the key to success is combining behavioral data with machine learning to automatically increase the relevance of search results.

Understanding the importance of precision and recall metrics

When a customer searches your website for information, the results they get back should be as relevant to what they are looking for as possible. This relevance can be measured using two primary metrics: precision and recall.

Precision is how many useful search returns are delivered versus the number that are not relevant to the customer. For example, looking at a customer search for “black boots” on an e-commerce site, if the results show ten different products, six of which are black boots, two of which are brown boots, one of which is black shoes, and one of which is brown socks, then the precision is 6 out of 10 or 60%.

Recall is how many of the total number of relevant options on your website are returned by a search. For example, If there are twenty products relevant to the “black boots” search query, but the system only returns sixteen, then the recall is 16 out of 20 or 80%.

At a small scale, having a little precision and recall “noise” in the search results may not seem so bad. However, when you scale this up to hundreds of results, it could mean dozens or even hundreds of irrelevant or unlisted results that your customers have to wade through to find what they are actually looking for. This can cost you sales.

To get a general metric for precision and recall, consider manually running 50-100 searches on your site and measuring the values based on the results you get.

Improving precision and recall

There are a few ways to improve precision. The easiest is to remove fields that contain a lot of “noise” from being indexed by your search engine. For instance, in the example above, brown socks might be showing up in a search for “black boots” because their description might include a statement like, “These pair well with many kinds of shoes, from white flats to black boots.” Removing the “description” field from your search engine will prevent this problem from occurring.

Often, however, description fields contain many useful and relevant keywords for an item. In that case, it can be useful to create a separate field that includes the relevant keywords from the description field without the irrelevant terms. This process usually has to be done manually, however, which is quite time-consuming.

To improve recall, you can take the opposite approach: add more keywords to your search fields. One simple way to do this is to find synonym lists for common keywords and add those to your search engine so that, for instance, the word “shoe” is added to any item containing the word “sneaker.”

As you can see, improving precision often hurts recall, and vice versa. For this reason, it’s important to try to make improvement quantitatively based on the formulas above. In that way, you can make sure your changes are having a positive overall effect.

Use machine learning to automatically improve precision and recall

Rather than manually optimizing your website search, machine learning offers a way for your search system to optimize itself automatically over time based on real-time usage. This is done by having the system feed data back into its algorithms based on the relevant results that are surfaced and used by the customer.

Optimizing search results using machine learning has been shown to increase conversion rates over time by constantly refining search result relevance and improving precision and recall metrics.

Constructor.io search as a service brings behavioral data to machine learning

To further improve search results, Constructor.io collects behavioral data from users—what they are searching for, clicking on, and buying—that our data scientists fold into the machine learning feedback. This provides improved search result relevance and reduces precision and recall noise. The result is that you continuously build the best search possible for your customers and grow your business with less effort.

Dan McCormick

Dan McCormick is the co-founder and COO of Constructor. A skilled business leader and product architect, Dan served as CTO of stock photo pioneer Shutterstock.