The efreedom answer at the 5th position is actually the most relevant - the stackoverflow question from which it was copied doesn't even show up on the first page. There is one stackoverflow result on the first page, but it deals with a more complex related issue, not the simple question I was looking for.

In Google's most recent cache, the efreedom result has the word "pass" on the page due to some related links content near the bottom, whereas the stackoverflow page does not. If you modify your query to [parse json body to spring mvc], stackoverflow is at position #1, and efreedom is at position #4. This still has room for improvement, but it would seem like the simplest explanation is just the better match on your query terms.

Didn't notice that - that's good to know. That's actually exactly how I'd expect a good search engine to behave. As annoyed as I am when I get a junk result, I'd be even more pissed if Google dropped terms from my query just so it can return a more popular site.

Of course, then all the content-copy farms will respond by copying valid content plus word lists - hopefully Google knows how to detect that.

True, it does... I just noticed this because I've actually got in the habit of scanning for stackoverflow results, first - they almost always are right on the money, and it's less cognitive overhead to read a site format I'm familiar with, with extraneous discussion well tucked-away.

It almost feels like a cache miss when I have to drop down to the official site/documentation, since that typically requires a greater time investment to read through to find the relevant sections.

I guess that's a tribute to how well stackoverflow works, most the time. And also to how lazy I am.

Thanks for the concrete query--I'm happy to ping the indexing team to make sure it's not tickling any unusual bugs or coverage issues. Jeff's original blog post helped us uncover a few of those things to improve.

Assuming we are looking at the same results, the pages at position #6 and #10 are not copies of the stackoverflow content at position #8. They are copies of http://stackoverflow.com/questions/1399293/test-priorities-o.... Unfortunately, the only place that the word "delay" (which is in your query) sometimes appears on that stackoverflow page is in the "related" links in the right column. At the time Google last crawled that stackoverflow page (see the cache), "delay" wasn't on the page, only "delayed". Whereas, the last time Google crawled the other two pages you mentioned, they did have "delay" on the page. Google should still be able to do better, but this little complication certainly makes things more difficult.

One UI issue we've struggled with is how to tell the user that there isn't a good result for their query. This comes up when we evaluate changes that remove crap pages all the time. For nearly any search you do, something will come up, just because our index is enormous. If the only thing in the result set that remotely matches the query intent is a nearly empty page on a scummy site, is that better or worse than having no remotely relevant results at all? I definitely lean towards it being worse, but many people disagree.

That ideal search engine would find itself quickly the target of people that would try to gain an advantage by figuring out how it works.

And then another SEO cycle would start. Don't forget that before google came along nobody was trying to 'game the system' with backlinks and other trickery, the fact that that google is successful is what caused people to start gaming google.

Any real-world search engine is going to be analyzed until enough of its internal mechanisms are laid bare to allow gaming to some extent.

Typically you pretend the search engine is a black box, you observe what goes in to it (web pages, links between them and queries) and you try to infer its internal operations based on what comes out (results ranked in order of what the engine considers to be important).

Careful analysis will then reveal to a greater or lesser extent which elements matter most to it and then the gaming will commence. Only by drastically changing the algorithm faster than the gamers can reverse-engineer the inner workings would a search engine be able to keep ahead but there are only so many ways in which you can realistically speaking build a search engine with present technology.

Your ideal, I'm afraid, is not going to be built any time soon, if you have any ideas on how to go about this then I'm all ears.

I think the solution is a diversity of search engines. Maybe even vertical search engines. These days I get such shitty results from google for programming related searches that I've started going straight to SO and searching there. If I don't find it there I then try google, then try google groups search.