Subscribe to SearchCap

A web page that definitively satisfies a searcher’s intent is “Perfect,” and should appear at the top of Bing’s search results. On the other end of the scale, spammy web pages and pages that almost no searcher would find useful are deemed “Bad.”

That’s a bit of how Bing instructs the people in its Human Relevance System (HRS) project to grade web pages. It’s explained in a 52-page document that Bing calls the “HRS Judging Guidelines.”

The HRS project is similar to the Quality Rater program that Google uses. Microsoft’s version has been around in some form since shortly after MSN Search began generating its own search results in late 2004. Like Google, Microsoft uses testing services (like Lionbridge and others) to hire human search evaluators and administer the program. (Microsoft often refers to the evaluators as “judges,” and I’ll do the same in this article.)

Very little, if anything, has been written about Microsoft’s HRS project, and the company’s communications team was understandably reluctant to discuss it with Search Engine Land when we contacted them recently. But, when we shared a copy of the guidelines document that was given to us by a former judge, a Bing spokesperson did confirm that it’s the current version of the HRS guidelines. The document is dated March 15, 2012.

What’s inside? How does Bing ask its human search quality judges to grade web pages? Read on for details.

Searcher Intent & Landing Pages

The document goes into detail about the three primary query intents (Navigational, Informational and Transactional) and offers suggestions for how to determine user intent based on the search query. Human judges are instructed to consider these four questions related to intent when they judge landing pages (the “LP” referred to below):

1. Intent: Does the LP content address a possible intent for the query?
2. Scope: Does the range and depth of the LP content match what the user wants?
3. Authority: Is the trustworthiness of the content on the LP appropriate to the expectations of the user?
4. Quality: Do the appearance and organization of the LP providing a satisfying experience?

Ultimately, judges are told to identify if a landing page satisfies searcher intent on a scale from “strongly” to “poorly,” with additional categories for obscene and inaccessible content.

The guidelines document explains that “A strongly satisfying page will closely match the user’s intent and requirements in scope and authority, while a poorly satisfying result will be useful to almost no users.”

The Rating Matrix

The HRS Judging Guidelines asks judges to rely on a Rating Matrix to grade web documents. The matrix combines A) likely searcher intent with B) how well the document satisfies that intent. A document that “strongly” satisfies the “most likely” intent is graded Excellent/Perfect, while a document that “poorly” matches the most likely intent is graded Bad.

Rating Options

The five rating options that judges can use are shown in the matrix above, but the guidelines offer a more detailed explanation. This is really the heart of the document — the section that reveals what Bing looks for in grading (and likely in ranking) web pages/documents.

Here’s how Bing explains the five possible ratings:

1.) Perfect

“The LP is the definitive or official page that strongly satisfies the most likely intent.”

The document says that a Perfect landing page “should appear as the top search result.” It also says that only one landing page will typically deserve a Perfect rating, but for some generic queries (such as “loans” or “insurance”) there will not be a Perfect landing page. A Perfect page should address the intent of at least 50 percent of searchers.

2.) Excellent

Bing describes this as a landing page that “strongly satisfies a very likely or most likely intent” and “closely matches the requirements of the query in scope, freshness, authority, market and language.” Users finding an Excellent landing page “could end their search here and move on.” An Excellent page should address the intent of at least 25 percent of searchers.

An example in the document is that Barnes & Noble’s home page is an “Excellent” result for the search query “buy books.”

3.) Good

A Good landing page “moderately satisfies a very likely or most likely intent, or strongly satisfies a likely intent.” Bing says most searchers wouldn’t be completely satisfied with one of these pages and would continue searching. A Good page should address the intent of at least 10 percent of searchers.

4.) Fair

This rating applies to pages that are only useful to some searchers. A Fair page “weakly satisfies a very likely or most likely intent, moderately satisfies a likely intent, or strongly satisfies an unlikely intent.” A Fair page addresses the intent of at least one percent of searchers.

5.) Bad

In addition to being useful to almost no one and not satisfying user intent, this rating applies to a web page that “uses spam techniques” or “misleadingly provides content from other sites,” as well as to parked domains and pages that attempt to install malware. A Bad page addresses the intent of less than one percent of searchers.

The document goes into some detail on additional ratings like “Detrimental,” which applies (in part) to web documents that display adult-only content, and “No Judgment” for pages that can’t be accessed for a variety of reasons.

Freshness

There’s a fairly detailed section on freshness. It explains why judges should take freshness into account when reviewing web documents and suggests situations when fresh content is more valuable and others when it’s not as important. The document explains that there are “essentially” three categories of freshness-related queries — Fresh Not Important, Very Likely Fresh and Most Likely Fresh — and offers this chart with example search queries to distinguish them.

Additional Considerations

There are also sections addressing queries where the search term is a URL, how to judge misspelled queries and how to judge local queries. For example, the home page of the Arizona Hispanic Chamber of Commerce is considered “Perfect” for the query hispanic chamber of comerce glendale az because Glendale is a suburb of Phoenix and there’s no Hispanic Chamber of Commerce office in Glendale.

As I said above, very little has ever been written about Microsoft’s Human Relevance System project for rating search results. From reading through the guidelines doc, I’d say it’s not all that different from Google’s handbook for its raters, which we first wrote about in 2008.

SMX Advanced is the only conference designed exclusively for experienced paid search advertisers and SEOs. You'll participate in experts-only sessions and network with fellow internet marketing thought leaders. Check out the tactic-packed agenda!

About The Author

Matt McGee is the Editor-In-Chief of Search Engine Land. His news career includes time spent in TV, radio, and print journalism. After leaving traditional media in the mid-1990s, he began developing and marketing websites and continued to provide consulting services for more than 15 years. His SEO and social media clients ranged from mom-and-pop small businesses to one of the Top 5 online retailers. Matt is a longtime speaker at marketing events around the U.S., including keynote and panelist roles. He can be found on Twitter at @MattMcGee and/or on Google Plus. You can read Matt's disclosures on his personal blog. You can reach Matt via email using our Contact page.

Sponsored

They could instead use min freelance and ask the workers to rate a search page result for satisfaction level.

http://www.brickmarketing.com/ Nick Stamoulis

” A Perfect page should address the intent of at least 50 percent of searchers.”

I find it interesting that a “perfect” page could still be the wrong information for 1/2 of the searchers. I would have thought that percentage would have been higher but at least Bing recognizes that it’s hard to nail user intent perfectly 100% of the time.

ChristianKunz

I really would like to know the percentage of websites that are rated this way. If the approach is to check a representative share then good look bing!

http://www.facebook.com/john.beagle John Beagle

And how do you go about looking at a billion search result pages and rating them individually for each set of keywords?

One thing that might help is if you had a reporting tool for bad serps.

Touseef Hussain

I think there should be no page 2, page 3, feature.. Bing Should be different. Instead of browsing through pages, there should be some jquery kind of thing which slides the page, it would be a better experience for the user…We are still using the same method what we were using in 1999

totnuckers

So basically they are just following Google’s lead

http://twitter.com/fiend4house internet boss

They need to focus more on strategic partnerships and effective marketing before this. Their search share continues to tumble against Google. What is a good set of search results worth if nobody ever sees them?

http://twitter.com/WesleyLeFebvre Wesley LeFebvre

Some great insight into how Bing’s search team thinks. Thanks for sharing it with us, Matt!

http://twitter.com/goodwinfamily goodwinfamily

Ad Center has a similar system for Ad Landing pages, actually, documented here:

A 25% chance of the page matching my intent doesn’t sound very excellent to me… (Ok, 25-50%, but at best, that’s still a failure rate of 1 in 2…)

Are people such sloppy searchers?

http://www.facebook.com/the.nathaniel.bailey Nathaniel Bailey

I get what your saying, but I would have thought there are some good
reasons behind that “at least” 50% being lower then you might think at
first.

Maybe it says “at least 50 percent” for a reason, such as searches that may have two or more meanings, meaning that not all people would find the result helpful or what they are searching for.

http://twitter.com/nondisclosure1 non disclosure

I worked on both Google’s and Bing’s search quality ratings teams. All the stuff you get from these self-proclaimed professional raters you interviewed is beyond basic.

The stuff you described above is the most basic core principles shared by both teams, and I dare say all similar systems. You don’t even need to be officially hired to get to this information and the chart you showed (copyright?). At both Bing and Google, once you are in the interview process, you’d be sent a handbook that includes all this, but before that, you have to sign a non-disclosure.

The Google rater you interviewed earlier sounds so incompetent and misunderstood/ failed to explain correctly things so much that the whole article is just misleading.

Don’t waste your time (and money) interviewing these people in the future. Anyone good enough to understand a non-disclosure agreement would not be dumb enough to talk to you about this.

http://blog.clayburngriffin.com/ Clayburn Griffin

Sites that unnecessarily delay content loading on purpose, usually to have totally awesome navigation animations, should be marked as bad.

DanHigson

Great post! It amazes me how many people are using Bing. At least it’s not Aol!

seoword

This makes sense. Thank you.
My dumb question of the day is what the raters are measuring. Are they rating individual sites or the accuracy of the algorithm for producing quality search results?