Eric misses the point of Google indexing weblogs, and sometimes ranking them rather high for some searches:

most people would consider google to be a better service if i, and a relatively small number of other people, didn’t get in the way of the information they really want.

He’s referring to the way he tops the list for galaxie 500 window crank by first talking about trying to fix his, and then talking about finding his entry while searching for information. What he’s missing, though, is two things: that’s an okay first shot search phrase, but as soon as he saw that it didn’t work, he should have revised his search, rather than expecting Google to guess right every time, and then searching for something like galaxie 500 “window crank” (note that many of the original results are talking about the window on one car, and the crank(shaft) on another) would point out that he’s number one because Google doesn’t have a damn thing useful for that search.

However, someone else (if they are at least half-bright), searching for that phrase because they have a broken window crank on their Galaxie, could go to Eric’s weblog, track down the entry, see the magic words “i’m fortunate to have the original shop manual”, track down his email address, and ask him if he’d be willing to trade copies of a few pages for some extra parts. That’s quite a bit more useful than the other results, offering to sell one car with a missing window, a Galaxie, and yet another car with a particular crankshaft.

I feel bad about the fact that at this moment I’m Google’s top result for http error 500, not because I’m “clogging up” the results and keeping people from finding useful information, but because I’ve been shown that there isn’t any useful information, and I’m still too lazy to fix the situation. There are a very few half-decent pages listing a bit of information about every HTTP response code, a million or so pages just repeating the explanations from the spec (“The server encountered an unexpected condition which prevented it from fulfilling the request.” – gee, thanks), and none that really understand how the web works today, and provide useful information in a useful way.

In the old days of directories, you would dig through Yahoo to find a link to a single page listing HTTP error codes, and find the one you were looking for, and maybe get lucky and get a helpful explanation as well. That’s not how it goes anymore: people search for something far too specific, and then give up. What the web needs for this particular class of query are separate pages, with the error code and name in the URL and in an <h1>, with an explanation of what it means, why you are seeing it, and what to do next (“something’s screwed up in your script, or your .htaccess, or your server: rename .htaccess and see if it works, then look in your error.log or ask your host to look, or try to remember what you just did to your server”).

Google isn’t saying “sorry, this sucks for what you searched on, but I’m so confused by all this incestuous linking that I just have to give it to you anyway,” what it’s saying is “I don’t have anything useful for that query, but people seem to think this guy knows his way around, and he’s at least used the words in your search, so maybe he linked to something that will prove useful.” So far, I haven’t, but that’s my failure, not Google’s. If you actually look at the sorts of things where Google ranks you “too high”, you’re likely to find that either nobody has anything useful to say, or there’s no way for Google to tell what’s authoritative yet. If three hundred pages include the term Googlewash, but Google hasn’t done a monthly reindex yet, all it can do is show you those, and hope that some highly ranked one linked to the right place. Once it reindexes, it can see that a couple hundred used the term in a link to one page, and that’s probably an authoritative source about it, though to cut down on Googlebombs, a page that also includes the term in the page title will probably come first, even with only fifty inbound “Googlewash” links. Your weblog entry isn’t ranked high for some searches because Google’s confused, it’s ranked high because Google needs your help, and expects you to link to things it hasn’t had a chance to sort out and fully index, or to link to the most useful thing about the keywords that your entry is highlighting. So get to it; it’s damn sure not going to be “content/6/30195.html” providing searchers with useful information and links.

This entry was posted
on Monday, May 19th, 2003 at 8:41 pm and is filed under blogging tech.
You can follow any responses to this entry through the post feed.
You can skip to the end and leave a response. Pinging is currently not allowed.

16 Comments

I came across something in webmasterworld.com about google last night – though I can’t find the particular URL at the moment, its somewhere in their Google forum (forum3?). What’s happening is that Google is giving ranking/listing boosts to new pages for 72 hours.

Of course, with blogging being an ongoing activity, it only encourages blog-type publishing if its true.

I feel a bit bad too, I’m number one for Southwest Airlines Accessibility – the page itself isn’t great (looking at the cache – I updated the actual page on my site this morning out of embarrassment really), the page it did index only got published on Friday night as part of uploading my own blogging tool – so that defeats the ”page ranking boost by inbound links” argument.

I’ve noticed blog clog also tends to piss off random searchers who don’t understand exactly why a given blog comes up before a usual site. I can’t count the number of comments I’ve gotten on posts from a year or more ago that say, ”WHERE IS THE INFO ABOUT (X)? I GOT HERE THRU GOOGLE NOW WHERE IS IT?”

great post phil! i’ve mustered a meager and likely incomprehensible response. i’m teetering on the fence, but i still suspect that most of the people that google needs to care about to continue to grow are confused by the noise. i know is this huge, steaming pile of unsubstantiated hooey. but hey, that’s what blogs are good for :-)

I’m really starting to think that the time when you can talk about where something appears in the Google results has passed. Certainly the time when you can talk about anything but ”at this precise minute” has passed, but, at this precise minute, I’m number one on one of nine data centers, and you’re number four on five of nine data centers (see google-dance.com and change the drop-down to ”Check: All 9 Datacenters”). So assuming that all nine are currently involved in load balancing, what you get depends on when you search, and how busy they are, and …

The one good thing I ever used it for was when I was installing The Browser Formerly Known As Phoenix for someone, and showed them how it defaults to searching for keywords entered in the location bar as a Google ”I’m Feeling Lucky” search by typing ”phil” in the address bar of their browser and going straight here. I’ll take any opportunity to look like a wizard I can get.

And so this is the part where I announce a new! and exciting! (and probably ongoing) contest, with prizes to be named later (maybe some of these, maybe something else) for the best response to Jay’s question. I figure as long as I keep getting these re…

Jesus! What’s wrong with you people at the moment? I don’t have time to talk about all the cool things out there, let alone all the things that just tweak my interest. Damn you for making me linklog. Damn you

In one of those occasionaly odd instances of synchronicity, on Friday I installed AWstats on my machine so I could look at the access logs for my site a bit more easily, and on Saturday I found this survey…

Well, no, it doesn’t seem to be. But then, the world’s turned several times since I wrote this: I’m now number 43 for that search, and a fair number of the sites above and below me exist only to cause the ads they carry to be shown to confused searchers. Hard though it is to believe, just two and a half years ago that was only true of relatively few very competitive terms, not absolutely every single phrase that anyone might ever search for, and without the massively widespread and very lucrative advertising programs sponsored by the search engines themselves, the whole ecology of search was very different, or at least seemed so to me.

Today, I can’t imagine myself writing about doing things to altruistically help out search engines with a straight face: then, it seemed quite reasonable to me.