Leaderboard Zone

Dr. Peter Jacso, of the University of Hawaii, has a good analysis of the interruptions in logic of Google Book Search. Among others, the dear beast has trouble conforming to the Boolean OR operation: with keywords searches on arrogant OR arrogance producing more results than the combined number for each phrase alone. Some of these results, in his ongoing analysis of GBS stand, in bleak contrast to the sophistication of the main search engine. This is perhaps because Google is attempting to use its web index algorithms to retrieve the data, as Battelle suggests, against the grain of typical information retrieval and structured search systems. Perhaps these are bugs in designing that system?

Jasco also compares GBS to Amazon’s book search, with devastating results. Bottom-line:

“Beyond simple keyword searching, Google’s software seems to be cognitively challenged, to put it nicely, and hinders access to the content, which would deserve at least a functional and half as smart software as Amazon has.”

Gary Price, who always appreciates the proper acknowledgment that book digitization began before Google, has more.

Content Marquee

Turn, a well-backed search service run by the former CEO of Alta Vista, is set to launch at the Web 2 conference Tuesday. But AdWeek has the story a bit early…

San Mateo, Calif.-based Turn in recent months has attracted $18 million in venture backing from Norwest Venture Partners, Trident Capital and Shasta Ventures. Turn has about 1,000 advertisers in its system, which displays ads on approximately 30 sites.

Unlike Google, which charges advertisers on a per-click basis, Turn relies on a cost-per-action scheme. It charges advertisers only if users take desired actions, such as filling out registration forms or closing on sales.

You all know I was bullish on Google’s ads for magazines. Philosophically, I feel the same way about this newspaper test. The Times coverage does nail the big issue – “it might make Google stronger.” The newspapers worry that Google treats them as just so much unsold backfill inventory. And that’s not what “content” really should be, eh?

I think the issue here for Google really comes down to scale and context. Print advertising is a maddeningly “human” business, driven by passion, emotion, and gut feeling. I’m not sure that’s ever going to go away. Ads for a specific, community driven audience need to be part of a conversation, not an algorithm. However, I can see this working well for remnant/backfill, as well as classifieds, where I’m guessing the system will really excel.

Yahoo! Slurp announces additional sophistication that allows webmasters greater control in directing a bot’s path with the robots.txt file.

Product manager Priyank Garg writes, “I was going through my notes from Danny Sullivan’s Open Feedback sessions during the ‘Meet the Crawlers’ panels at Search Engine Strategies conferences. One of the items on my list was a request for enhanced syntax in robots.txt to make it easier for webmasters to manage how search crawlers, including Slurp, access your content.”

The two new variations are ‘*’ –which will identify a sequential string of characters, and ‘$’ –which will denote pages with a given ending in their file name. These can be used to tell the Slurp bot, for example, to follow (or ignore) any pages in the directory with the phrase ‘_print’ (‘/*_print*.html’). As well, their combinations work–for example, to disallow ‘.gif’ files (‘/*.gif$’).

A great dialog has been occurring in the comments of this followup post on Google and the CIA (Original post here). Google’s official response was brief (along the lines of ‘oh puhhhlleeease!’), and one of the commentators asked if I would shoot Google a follow up question, to wit:

“Has Google released search-related information to any branch of the government, or government subcontractors?”

I sent this query to the fellow who works on these issues for Google communications. I expected his reply to be predictable – Google has to follow the law after all – but here it is for your benefit:

As you know, we comply with valid legal processes such as subpoenas but do not discuss the details of these requests publicly. Related to Steele’s assertions, we provide our enterprise systems to government agencies but there is no data sharing. These relationships are the same as with our customers in the private sector – they plug the appliance in and it provides search for their Intranet or website. The point of the statement we provided was to convey that neither responding to valid legal processes or selling search appliances are anything that could even be remotely interpreted as being “in bed with the CIA.”

What can we take away from all this? Well, I think it’s a good bet that Google is getting plenty of “valid legal requests,” but it won’t verify that it does. In fact, under Patriot, it would break the law to do so. This is the shit I have been on about for a very, very long time.