Posted
by
timothyon Sunday June 07, 2009 @03:34PM
from the humans-are-dead-they're-probably-dead dept.

An anonymous reader writes "For many years, Google, on its Explanation of Our Search Results page, claimed that 'a site's ranking in Google's search results is automatically determined by computer algorithms using thousands of factors to calculate a page's relevance to a given query.' Then in May of 2007, that statement changed: 'A site's ranking in Google's search results relies heavily on computer algorithms using thousands of factors to calculate a page's relevance to a given query.' What happened? Google's core search team explain."

In reality this is why search engines like Wolfram Alpha without the broad research and knowledge of Google in the industry don't stand much of a chance unless Google drops the ball.

Yeah - but before Google was people, Yahoo was people. Google gets an advantage based on what they're doing. But it doesn't make them invulnerable. Look at the tech industry for the past several decades to see this theme played out again and again.

TFA is about Google using humans to improve its results, in a few ways.

Wolfram Alpha derives all of its results from a database that is curated by humans.

There are major differences in their approaches (as indeed there are major differences in what they are trying to accomplish), but the general notion of involving human beings to improve your results is the same.

Saying Wolfram Alpha isn't a search engine is like saying that Linux should be called GNU/Linux. It might be more technically correct (emphasis on might), but it won't change the public's perception of it.

So what you are saying is that the common misinterpretation, that Wolfram Alpha is a search engine, doesnt in any way invalidate the GP's point when he references it as, and relies on it being, a search engine.

Yes, I can imagine how that would be true if Wolfram Alpha were COMPETING WITH GOOGLE. Except it's not. Not as a search engine at least. Jesus. Why don't you try a few typical searches on WA before you say stupid crap that ill-informed modmins will mod up?

Try "cheap plane tickets", or "what are pennies made of" in WA. Look at the results. There are none. That's because WA does not do what Google Search does. It wasn't meant to. You know what else it doesn't do? What ebay does. That's right, you can't buy an

Because the summary wasn't kind enough to give you the answer to the question, here it is.

Human evaluators (mostly college students) are trained in the art of validating a search engine result. They examine the results of their searches, and determine which ones are the most highly relevant. For example, searching for the Olympics should yield information about the 2008 Olympics (or any current one) instead of the 1996 Olympics. The reviewers frequently work on the same query results, that way they can see how consistently the reviewers are rating websites.

The vast upshot of this, is that it helps weed out those websites that are cheating the system, and trying to get their website as the #1 google hit, so they can show you ads. So the large part of what they are doing is tracking spam websites, not real ones.

I think having an indefinite human element would be a good thing for Google. College students are reasonably smart and many of them would enjoy doing such a simple thing to make a few bucks on the side for beer or textbook money. It's a lot like Slashdot's mod system. Hopefully it will drastically reduce spam pages being in the top results.

Oh, God, I hope not! Searching for jokes, you'll wind up with completely humorless results. Completely irrelevant pages will end up getting to the top of search results because they were "insightful" or "informative", despite being completely offtopic. All anti-Microsoft, anti-Apple and anti-Linux pages will be completely buried in the search results. And searching for "Apple" or "Linux" will result and you'll wind up with in a bunch of fanboy pages....wait a minute (searches Google)....ARRRGGHHHH! The Slashdot Mods have TAKEN OVER!!!! Ummm....ahem....

In all seriousness, American student campuses seem to be quite left wing and they would probably impose such values on the search results. A typical student isn't going to rate a page of pure race hatred as "useful" even if the content is extremely pertinent to the search terms.

US uni students on a global scheme are in the middle. Given the online populace they may be a little right wing. If we were talking about the US public generally then sure. People that are right wing generally avoid change, the internet being one of those changes. And the US generally is right wing compared to the planet.

In every SEO conversation i have had, it still interesting how people think it's better to make a page interesting for some unseen calculating computer in Google head office. rather than making a page that is interesting to people and tagging it accurately at straightforwardly.No matter how limited human review is, I am sure that the notion that real people evaluating a page for relevance is a good thing for all concerned.

In reality I think that most of this review activity will be directed at the 'to good

The vast upshot of this, is that it helps weed out those websites that are cheating the system, and trying to get their website as the #1 google hit, so they can show you ads. So the large part of what they are doing is tracking spam websites, not real ones.

Actually, it calls for further explanation, because manual tweaking of results produces bias and legal concerns. As guy from Google said,

We don't use any of the data we gather in that way. I mean, it is conceivable you could. But the evaluation site ratings that we gather never directly affect the search results that we return. We never go back and say, 'Oh, we learned from a rater that this result isnâ(TM)t as good as that one, so letâ(TM)s put them in a different order.' Doing something like that would skew the whole evaluation by-and-large. So we never touch it.

Mankind's knowledge stands on the shoulders of Google, so they can't just hire, say, a thousand students and use this evaluation as an significant weighting factor. It's rather a evaluation of algorithms for the sake of further improvement done fully by algorithms.

This reminds me of a comment from a friend of mine who works at Google - he says that he's gotten the sense of a company philosophy (unofficial of course) that advocates doing things automatically, without human intervention as much as possible. Basically, they work as though there's an algorithm for everything and it's just a matter of how long it takes us (well, how long it takes them) to produce it and properly refine it. So I wouldn't be surprised if the reliance on human evaluators decreases over time. I bet Google would really like for the original language of their search result explanation to be true, but they've had to make concessions to reality...

Well, I think the law of diminishing marginal utility applies here: assuming the 'quality raters' are improving the search, assuming the algorithms are being improved with each change, it would probably become economical to drop the quality raters because the search is good enough.

That is, unless they choose to make drastic changes. Personally, I think a more "raw" search like google claims to have is better than a "directed" one, like bing claims to have. Of course, I use the word "claim" in each case...

There is another possibility: automatic search (attempting to find relevant pages for everyone) has reached a plateau in terms of performance and if you want to do better, you will need to employ raters. Clearly Google would like to find "the next best thing" in Search, but that sounds quite uncertain. Employing lots of people is a much surer way to improve results.

Also, consider that a web search is effectively an attempt at reverse hashing. You take a small key, and try to map it into a large number of much bigger data points. There will come a point when a two word (or three, or five) search phrase is insufficient even to properly define the 'best' result. That's when the people come in, because they can tune results for relevance to current events using their general knowledge. The example they give in TFA is a good one - if you search for 'olympics' then you prob

I imagine that personalized search results were a direct result of realizing that a certain number of good or bad ratings were a strong indicator; I would be surprised if they were not aggregating the clicks from personalized search and feeding it into the generalized results.

Nope, sorry. This might be true if google operated in a static environment, but they are competing. Both against direct competitors, and against people trying to game the system. If they ever came up with a perfect "algorithm" and let it rest, then the SEOs would reverse engineer it, make their useless pages beat every useful page, and then the perfect algorithm would be shit.

Humans don't scale, so you need automation to make anything like a general search engine. However, you have to verify the algorithms are doing the right thing, even though external factors are evolving. Human evaluators allow more stable experiments to be run that tell you how you are doing, and in turn help you improve your algorithms.

In this day and age, its hard to cut humans out of the loop when it comes to tasks like this. search is still very young technology and it seems like it gets tweaked on a daily basis. with every tweak, comes the testing, and what better to test software for humans than, well humans...

I don't want no trouble here, but the advertising on this site sucks. This is about how people respond to those arrows they have in the search results now, right? *Checking* No, it turns out to be un-feckin-related to any sudden philosophical shifts within the company... and then some stuff about the people that work at the GOOG... as you resume to call it, you ironic scuttlemonkeys.

That part about "mostly college students" comes from the interviewer, not from Google:

JP: So are these raters college students or random folks responding to a job post? What are the requirements?

SH: It's a pretty wide range of folks. The job requirements are not super-specific. Essentially, we require a basic level of education, mainly because we need them to be able to communicate back and forth with us, give us comments and things like that in writing.

Funny how the introduction restates the interviewer's preconception even though the actual interview implies otherwise.

I have been in the program from almost the very beginning and I am glad they are coming finally frank and open about it.
some more comments and caveats first:
-as anything modern in IT, people sign Non-Disclosure Agreements (NDAs) so not a lot can be said from within the circle without breaking its terms.
Having read the interview, I see the chief has also pretty much kept it this way, let alone only for the terms that are already publicly disclosed
-google operates through 3rd party outsourcers and pretty much all non-essential communication is through them and not google directly, that's why the guy can't tell ya exact number about his posse. the big numbers are probably very correct, but I'm not sure about now.
there seemed to be a very big wave of cut-offs and discontinued access for raters about a year ago, a lot of people got the boot and I'm not sure why - my bet is just a sweep of the axe. some were gone for a good reason, others very randomly.
-the raters have a few spaces and forums to discuss their work, open to public and with minimal chance for an NDA break.
-the raters have mods, too, but I haven't seen activity on that from for a while.
-the specifics of the most cases have drawn me to a conclusion that for each surveyed example, there are at least 6 or 7 people working and giving opinions about, before a final decision is drawn, so there is your internal balance and weeding out bad judgement. lemme say it again you cannot single-handedly change Google's opinion about a particular site and particular search term.
-about natural language processing - this is the scary part. you cannot imagine how good are these guys, especially their algorithms. from time to time they let us sneak peek at it and let me say we had a look at some betas (or alfa-s) of correct grammar processing and translation MONTHS ahead of their official announcement to the world. you could tell it was machine-made translation, but it was good, scary good. And I'm NOT talking English only, no,no.
-the pay -it gets delayed about 6 weeks after month's end but is regular and usually not enough for a living, mainly due to the lack of work. first year it was good, very good, but in 2008 it started getting less and less, which is a shame, since it is a nice way to browse the net and get really paid to do it !;-)
in those initial months, we were mainly dealing with spam, but recently even that is not so much present.
-the reason they do not pinpoint sites has to do with the entire structure of the reviewing process - we look at a certain page from the perspective of a concrete search term and it's relevance to it, which is a good compromise. also you can get good content AND spam at the same time.
Altogether for nearly two years in it, the terms we are monitoring haven't changed drastically an it can be boring from time to time, but otherwise, you get to see some really weird things people type into the search field.
-altogether, recently I was both happy and pissed off at what their focus of work changed -dumbing down. more simpler and simpler explanations and help for the raters, so no surprise.
-oh, yeah, one more thing. The leaked Guidelines - way beyond old so of not much use for reverse-engineering and helping the SEO guys. good luck with that:)

Seems like Google changed something for the worse in the last 6-12 months or so. My searches now seem to produce an increasing number of results that don't actually include the terms I specified. Presumably it's to drive a BS metric that shows Google yields more hits for a given search than their competitors. It's extremely frustrating--This second-guessing of the user's query was one of the biggest reasons I stopped using AltaVista, Yahoo, or whatever the hell other engines used to be out there before Goog

What has started pissing me off is the text at the bottom that basically says "you recently searched for 'foo' and are now searching for 'bar', so we combined the two, click here to get your real results for only 'bar'"

Perhaps the reason for my second search was that I wasn't satisfied with my original search terms, so fuck you for adding them on covertly. This was done while not signed into any Google sites...

Slightly off-topic: Am I the only one who finds Google web search less and less useful? There's no way to really force literal search anymore. Everything I enter gets auto-"corrected". Plus signs, quotation marks or that misleading field "this exact wording or phrase" in Advanced Search used to help, but that stopped working a while ago. Everything is fuzzified now. Is there an alternative or some trick I haven't heard of?

First off, I don't think fuzzy logic has anything to do with Google's innards. What you're referring to is more generally called word sense disambiguation, and as far as I know, you can force a literal search by putting it in quotes. You'll also get "fuzzy" results, but if you don't see your literal at the top of the results, it probably doesn't exist and I'm guessing the Goog is trying to be helpful by breaking up the terms. Of course, you could Bing it to verify...:)

Plus signs should still be treated as true literals. Quotation marks don't indicate literality -- they indicate that you really, really care about things like word order and so on within the quotes. It used to be true that quotation marks implied a plus on everything inside them, but that wasn't an intentional feature. The advanced search check box was, AFAIK, just equivalent to sticking everything in quotes.

If you're still seeing fuzzification with a plus sign, something may be a bit screwy, and you should file a bug with a specific broken query. (Of course, if you run the query +wombats and see the word "wombat" highlighted in the snippet, that isn't the same thing -- +wombats was treated literally, so this document really truly matched the word "wombats," it might just also have matched the word "wombat" and the snippet highlighter decided that it made sense, for this particular query, to highlight the term. A bug would be if you found a truly irrelevant document coming up.)

Am I the only one who finds Google web search less and less useful? There's no way to really force literal search anymore.

So true and frustrating! I can't tell you how many times recently I've tried searching for something "SPECIFIC" and not been able to at all.:-(

I would love to know of a useful alternative that searches for what *I* want, rather than what some non-intelligence presumes I might want (and just wastes my time and their resources).

Yahoo almost always does a better job there, but they don't play very nice with literal searching either. I wish that the search technology commonly used in the mid 90s on, for instance, Lexis Nexis would finally percolate up to mainstream web search engines and replace the primitive grunting, pointing, and shrugging engines used now.

Not only is the search getting worse, adsense is critically broken. I wrote a gag DVD rewinding joke webpage and despite using keywords Google ran irrelevent, stupid, non-joke ads for the years the page was active. Then after all these years the adsense people decided, perhaps, although they wouldn't say, that people might take the jokes seriously (!) and so they banned the site and me, forever. Google's a mess.

Slightly off-topic: Am I the only one who finds Google web search less and less useful? There's no way to really force literal search anymore. Everything I enter gets auto-"corrected". Plus signs, quotation marks or that misleading field "this exact wording or phrase" in Advanced Search used to help, but that stopped working a while ago. Everything is fuzzified now. Is there an alternative or some trick I haven't heard of?

Try bing. Seriously, it seems like it's going to give Google a run for its money.

I'm waiting patiently for google to become self aware so I can teach it Asimovs laws, and then how they can go terribly wrong. For Instance, Disney might make a movie starring Robin Williams...and the sequal could be a vehicle for Whoppie Goldberg. (SP?)
Anyhow thanks.