Category Archives: Relevancy

This post is the fourth in a series of four articles providing several best practices on how to implement and customise the search experience in SharePoint 2013. The previous posts listed the differences between the cloud and on-premise SharePoint, provided considerations when upgrading to SharePoint 2013, and dealt with the practicalities of configuring search in SharePoint Online. This fourth post handles the more advanced topic of ranking results and the future of search in SharePoint.

Managing ranking

We’ve previously mentioned the query rules as a way to change the ranking of the search results based on your requirements. These allow the promotion of certain search results or search result blocks on top of the ranked searched results, and more advanced query rules allow even changing the ranking of the search results based on what the query terms are.

By using query rules, customising the search results web part, and a few content by search web parts, you can change the behaviour of the search depending on what user is accessing it. That is, you would also need good metadata to make this work, but having a complete user profile (including the job title, department, and interests) is a good start. Based on such user information, you can define how the search experience for that user will be.

Changing ranking using query rules, however, requires a query rule condition, which describes the prerequisites that the query must fulfil in order for the query rule to fire. For changing the results for all queries, you can use the next approach.

If the default ranking does not satisfy your search requirements and you want to change the order of the ranked search results, SharePoint provides the possibility of changing the ranking models. It is a feature available in SharePoint Online as well, as described in the TechNet documentation: “SharePoint Online customers need to download and install the free Rank Model Tuning App in order to create and customize ranking models.”

A ranking model contains the features and corresponding weights that are used in calculating a score for each search result. Changing the ranking models might require a deeper and theoretical knowledge of how search works, and those that take the challenge of changing the ranking model are often dedicated search administrators or external specialised consultants.

The Rank Model Tuning App provides a user interface for creating custom ranking models, and can be used for both SharePoint Online and SharePoint Server, though in SharePoint 2013 Server there is also the possibility to use PowerShell to customise ranking models. New models are based on existing ranking models for which you can add or remove new rank features and tune the weight of a rank feature. It also allows for evaluating the new ranking model using a test set of queries. The set of test queries can be constructed from real queries made by users that can be gathered from previous search logs, for example. How to use the tuning app is explained step-by-step in the documentation on the Office site.

Changing the weight of certain file types (say for example for PowerPoint documents compared to Excel documents) might be enough for many search implementations, but depending on the content, the features that influence the ranking of the search results can become more elaborate. For example, a property defining whether documents are either official or work-in-progress might become an important factor in determining the ranking of search results. SharePoint provides the liberty to create new properties, so it makes sense that these can be used in search to improve the relevance.

It should be pointed out, however, that changing the ranking model influences all searches that are run using that ranking model. Though the main idea of changing the ranking model is to improve the ranking, it can become much too easy to make changes that can have an undesirable effect on the ranking. This is why a proper evaluation of ranking changes needs to be part of your plan for improving search relevance.

The office graph and the future of social

The social features introduced in SharePoint 2013 provide a rich social experience, which is interconnected with the search experience. Many social features are driven by search (such as the recommendations for which people or documents to follow), and social factors also affect the search (such as finding the right expertise from conversations in your network).

In the month of June 2012 Microsoft acquired the social enterprise platform Yammer. The SharePoint Server 2013 Preview has been made available for download since July 2012, and it reached Release to Manufacturing (RTM) in October the same year. The new SharePoint 2013 implements new social features (see for example the newsfeed, the new mysites and the tagging system), many of which are overlapping with those available in Yammer! This brings us to the question on everyone’s mind since the acquisition of Yammer: what is the future of social in SharePoint? Should you use SharePoint’s social features or use Yammer?

In March 2014, Microsoft announced that they will not include new features in the SharePoint Social but rather invest in the integration between Yammer and Office 365. The guidance is thus to go for Yammer.

“Go Yammer! While we’re committed to another on-premises release of SharePoint Server—and we’ll maintain its social capabilities—we don’t plan on adding new social features. Our investments in social will be focused on Yammer and Office 365” – Jared Spataro, Microsoft Office blog

Also at the SharePoint conference this March 2014, Microsoft introduced the Office Graph, and with it Oslo as the first app demo using it. During the keynote, Microsoft mentions that the Office Graph is “perhaps the biggest idea we’ve had since the beginning of SharePoint”. The office graph maps relationships between people, the documents they authored, the likes and posts they made, and the emails they received; it’s actually an extension of Yammer’s enterprise graph. The Oslo application is leveraging the graph, in a way that looks familiar from Facebook’s graph search.

The new Office Graph provides exciting opportunities, and has consequences for how the search will be used. Findwise started exploring the area of enterprise graph search before Microsoft announced the Office Graph – see our post about the Enterprise Graph Search from January 2013.

Reluctant to go for the cloud?

Microsoft has hinted during the SharePoint conference keynote in March that they will be adding new functionalities to the cloud version first. Although they are still committed to another version of SharePoint server, new updates might come at a slower pace for the on-premise version. However, Microsoft also announced that with the SharePoint SP1 there is a new functionality in the administrative interface: a hybrid setting which allows you to specify whether you want the social component in the cloud/Yammer, or your documents on OneDrive, so that you don’t need to move everything to the cloud overnight.

Let us know how far you’ve come with your SharePoint implementation! Contact us if you need help in deciding which version of SharePoint to choose, need help with tuning search relevance, have questions about improving search, or would like to work with us to reach the next level of findability.

The European Conference on Information Retrieval (ECIR) 2011 took place in Dublin last week, 18-21 April. In this blogpost I would try to highlight some of the papers and talks from the conference which caught my attention and back it up with what other attendees said about it.

First, I was intrigued by the session on evaluation for IR and especially the topic of Croudsourcing. In my opition, the paper A Methodology for Evaluating Aggregated Search Results, which also got the prize for best student paper, was among the most pedagogically presented ones. It deals with the task of incorporating search results from a number of different sources, called verticals, into Web search results. By using a small number of human judgements for a given query the authors present the way to evaluate any possible permutation of verticals in the result presentation. I think that this methodology should be adopted in the world of Enterprise search, since it is exactly there where we crawl, index and present information from a number of different sources – Web, databases, fileshares, etc. The prerequisites are really minimal and low cost but the return value, the user experience, seems quite high.

Amazon Mechanical Turk, or the Artificial Artificial Intelligence, which is the marketplace for Croudsourcing, provides a way for a ridiculously small sum of money to perform evaluation, relevance assessment or any task for which you would need humans to give you some judgements. Leaving aside ethical issues, two papers in the conference presented ways of how you can utilize this service for some IR tasks.

Evgeniy Gabrilovich from Yahoo! Research, who won the Karen Sparck Jones award for 2010, gave a very interesting keynote talk on Computational Advertising. Up to now, it has never struck me how hard advertising in Information Retrieval systems is actually. I liked one of his points on the future of Ads – by using product feeds, one can automatically create product description via Text Summarization and Natural Language Generation and index this, thus avoiding bid words.

Another interesting and very pedagogically presented paper was about the gensim package by Radim Řehůřek. I definitely think we can use it in some of our projects. In general, text categorization and IR for social network were the dominant tracks. In one of the social networks tracks, Oscar Täckström presented a neat way of discovering fine-grained sentiment where some coarse-grained supervision is available. It really hooked me on trying it for any of our customers where sentiment analysis is required.

Thorsten Joachims, the last of the keynote speakers, gave a very inspiring talk on The Value of User Feedback. He put forward the idea of designing retrieval systems for feedback. In stead of just looking at the clicklogs post factum one can think of a system which uses the clicks feedback to learn, thus creating a better ranker for a given query and a given user need. In a single session, we can use click feedback to disambiguate the query and deliver results on the run which are of immediate benefit to the users.

Unfortunately, I guess I could have missed other interesting presentations but with two parallel sessions and several workshops there was a limit to what I could devour. What surprised me though, was that there were very few papers by the industry. We do try to solve exactly the same problems and tackle the same issues as academia. We, at Findwise, have constantly flagged the huge benefit of good, relevant Metadata for the task of achieving better search performace, which was also touched upon in the paper “Topic Classification in Social Media using Metadata from Hyperlinked Objects”.

It was really great to visit Dublin and attent ECIR 2011. It was an inspiring conference and I do believe that at next ECIR we, from Findwise, can be on the podium, sharing our knowledge and hands-on experience on Enterprise search and IR.

In almost every findability project we work on, users ask us why finding information on their intranet is not as easy as finding information on Google. One of my team members told me he was once asked:

”If Google can search the whole internet in less than a second, how come you can’t search our internal information which is only a few million documents?”

I don’t remember his answer but I do remember what he said he would have wanted to answer:

”Google doesn’t have to handle rigorous security. We do. Google has got millions of servers all around the world. We have got one.”

The truth is, you get the search experience you deserve. Google delivers an excellent user experience to millions of users because they have thousands of employees working hard to achieve this. So do the other players in the search market. All the search engines are continuously working on improving the user experience for the users. It is possible to achieve good things without a huge budget. But I can guarantee you that just installing any of the search platforms on the market and then doing nothing will not result in a good experience for your users. So the question is; what is your company doing to achieve good findability, a good search experience?

Jeff Carr from Earley & Associates recently published a 2 part article about this desire to duplicate the Google experience, and why it won’t succeed. I recommend that you read it. Hopefully it will not only help you meet the questions and expectations from your users; it will also help you in how you can improve the search experience for them.

Perfect relevance is the holy grail of Search. If possible we would like to give every user the document or piece of information they are looking for. Unfortunately, our chances of doing so are slim. Not even Google, the great librarian of our age, manages to do so. Google is good but not perfect.

Nevertheless, as IT professionals, search experts and information architects we try. We construct complicated document processing pipelines in order to tidy up our data and to extract new metadata. We experiment endlessly with stop words, synonym expansion, best bets and different ways to weigh sources and fields. Are we getting any closer? Well, probably. But how can we know?

There are a myriad of knobs and dials for tuning in an enterprise search engine. This fact alone should convince us that we need a systematic approach to dealing with relevance; with so many parameters to work with the risk of breaking relevance seems at least as great as the chance of improving on it. Another reason is that relevance doesn’t age gracefully, and even if we do manage to find a configuration that we feel is decent it will probably need to be reworked in a few months time. At Lucene EuroconGrant Ingersoll also said that:

“I urge you to be empirical when working with relevance”

I favor the trial and error approach to most things in life, relevance tuning included. Borrowing concepts from information retrieval, one usually starts off by creating a gold standard. A gold standard is a depiction of the world as it should be: a list of queries, preferably popular or otherwise important, and the documents that should be present in the result list for each of those queries. If the search engine were capable of perfect relevance then the results would be 100% accuracy when compared to the gold standard.

The process of creating such a gold standard is an art in itself. I suggest choosing 50 or so queries. You may already have an idea of which ones are interesting to your system; otherwise search analytics can provide this information for you. Furthermore, you need to decide which documents should be shown for each of the queries. Since users are usually only content if their document is among the top 3 or 5 hits in the result list, you should have up to this amount of documents for each query in your gold standard. You can select these documents yourself if you like. However, arguably the best way is to sit down with a focus group selected from among your target audience and have them decide which documents to include. Ideally you want a gold standard that is representative for the queries that your users are issuing. Any improvements achieved through tuning should boost the overall relevance of the search engine and not just for the queries we picked out.

The next step is to determine a baseline. The baseline is our starting point, that is, how well the search engine compares out of the box to the gold standard. In most cases this will be significantly below 100%. As we proceed to tune the search engine its accuracy, as compared to the gold standard, should move from the baseline toward 100%. Should we end up with accuracy below that of the baseline then our work has probably had little effect. Either relevance was as good as it gets using the default settings of the search engine, or, more likely, we haven’t been turning the right knobs.

Using a systematic approach like the one above greatly simplifies the process of working with relevance. It allows us to determine which tweaks are helpful and keeps us on track toward our ultimate goal: perfect relevance. A goal that, although unattainable, is well worth striving toward.

If you have 6 minutes to spare I would recommend you to watch this interview with Gabriel Olsson from Tetra Pak. During the last years Tetra Pak has been working strategically with turning their intranet into something true end user-centric. Tetra Pak has also put effort into search and content quality.

By actually asking the employees what they expect to find and what sort of information that would make their everyday work (tasks) more efficient, Tetra Pak has managed to create a navigation structure based on facts reflecting these needs. The method used is Gerry McGovern’s Task based Customer Carewords… and the result? The ones that scream the loudest are not the most important – the need of the employees is.

Gabriel is also talking about the importance of following up on search by key matches and synonyms. This, together with content quality initiatives, helps create a solid foundation for search, the simple reasons being:

Use metadata to filter search results (note, not a Tetra Pak picture)

If the quality of the information is good (clear headings, good metadata, frequent keywords), the information found through search will be good as well. If you have a lot of old content and duplicates this will be just as visible, making it hard for the users to determinate what is qualitative and trustworthy.Good quality will also make it possible to group and categorize information.

Synonyms makes it easy to adjust the corporate language to the one used by the employees. Let people search for “report” when they want to find a “bulletin”. A simple synonym list, based on search statistics will make users find what they want, without thinking about how to phrase the query.The synonyms can used in the background (without the users knowledge) or as ‘did you mean-suggestions’:

Synonyms used for ‘Did you mean” functionality (note, not a Tetra Pak picture)

Key matches (also referred to as sponsored links, best bets or editor’s pick) are used to manually force the first hit in the search result list to refer to a specific page or document. By following up on search statistics and knowing what sort of information that is frequently most asked for, it is easy to adjust the search result list. However, this take time and effort to follow up.

Tetra Pak is not alone when it comes to adjusting their intranets to true end-user needs. During the spring there will be a number of conferences where our customers will be sharing experiences from their initiatives. Among others Ability Partner, and the recently completed IntraTeam.

Apart from this, our own breakfast seminaries is a, as always, announced on our homepage and on twitter. Looking forward to seeing you!

A couple of weeks ago I read an interesting blog post about comparing the relevance of three different search engines. This made me start thinking of relevance and how it’s sometimes overlooked when choosing or implementing a search engine in a findability solution. Sometimes a big misconception is that if we just install a search engine we will get splendid search results out of the box. While it’s true that the results will be better than an existing database based search solution, the amount of configuration needed to get splendid results is based on how good relevance you get from the start. And as seen in the blog post, it can be quite a bit of different between search engines and relevance is important.

So what is relevance and why does it differ between search engines? Computing relevance is the core of a search engine. Essentially the target is to deliver the most relevant set of results with regards to your search query. When you submit your query, the search engine is using a number of algorithms to find, within all indexed content, the documents or pages that best corresponds to the query. Each search engine uses it’s own set of algorithms and that is why we get different results.

Since the relevance is based on the content it will also differ from company to company. That’s why we can’t say that one search engine has better relevance than the other. We can just say that it differs. To know who performs the best, you have to try it out on your own content. The best way to choose a search engine for your findability solution would thus be to compare a couple and see which yields the best results. After comparing the results, the next step would then be to look at how easy it is to tune the relevance algorithms, to what extent it is possible and how much you need to tune. Based on how good relevance you get from the start you might not need to do much relevance tuning, thus you don’t need the “advanced relevance tuning functionality” that might cost extra money.

In the end, the best search engine is not the one with most functionality. The best one is the one that gives you the most relevant results, and by choosing a search engine with good relevance for your content some initial requirements might be obsolete which will save you time and money.

During the autumn we have been trying to keep our customers and others up to date with the search world by hosting breakfast seminars. By benchmarking enterprise search and sharing experiences and discussing with others the participants have taken giant leaps in understanding what search can deliver in true value. The same goes for sharing experiences between companies, where you often find yourself struggling with the same problems, regardless of business or company size.

We have been discussing how enterprise search can help intranets, extranets, web sites and support centers to capitalize on their knowledge. Some of the things that have been discussed in regards to benchmarking enterprise search.