Programmers use the idiom of “doc or die” when a procedure needs a document and fails if it can’t find it. Documents are also essential in web services, supply chains, and information-intensive applications in every domain. This blog discusses documents and information designs “in the wild" - especially those that are exceptionally good or exceptionally bad.
Bob Glushko (along with Tim McGrath) is the author of DOCUMENT ENGINEERING (MIT Press 2005)

Friday, July 11, 2008

Is Google Making Us Stupid, And What to Do About It

It is summertime, and I'm busy rethinking and revising the reading list for my fall course at UC Berkeley (" Information Organization & Retrieval"). Even though the intellectual foundations and themes of this course like conceptual modeling, semantic representation, classification, vocabulary and metadata design, etc. are timeless, technology and business practices continue to evolve. Besides, if I don't revise the syllabus I'll be bored and my teaching will show it, and I can't let that happen.

One article that might make it into my fall syllabus is Nicholas Carr's "Is Google Making Us Stupid?" in the July 2008 Atlantic Monthly. Carr suggests that the downside of the nearly effortless and immediate information access that the web affords us is diminished capacity to read and focus on printed works, especially books. In Carr's view, the style of reading encouraged or even mandated by the web, in which information is organized in hyperlinked fragments, is "chipping away at my capacity for concentration and contemplation."

The title of Carr's article comes from the argument that this fragmentation of reading and thinking is essential to Google's business model, because it and other firms that monetize web use need "the crumbs of data we leave behind as we flit from link to link – the more crumbs, the better… It's in their economic interest to drive us to distraction."

Carr is notorious for provocation (remember the debate he started about whether information technology matters?) and of course his article was meant to bait the defenders and disciples of the web into counterattacks. Sure enough, John Batelle and others took the bait, and with an even more provocative title Batelle lashed back ( "Google: Making Nick Carr Stupid, But It's Made This Guy Smarter." A less rabid reaction came from Jon Udell, who suggested that it is up to each of us to find the right balance of big and little information chunks to consume.

I tend to agree more with Carr than Batelle. Of course the web makes it incredibly easy to find satisficing information -- to find something that minimally meets an information need -- and that's great if I want to check a fact or temperature or stock price where any source with the information will do. But the web makes it much harder to meet the more intellectually important goal of getting your head around some issue, which you can often most easily do by reading a tightly integrated analysis in a book or scholarly article, because these are very difficult to locate using web search.

And this IS partly Google's fault, because Google fundamentally determines relevance by the words that appear on individual web pages. So what you get in results lists are pages that have the search terms, so results listings are cluttered with the blog rants and less comprehensively researched information. Maybe I'm just "old school," but when I need more than facts or news stories I use the California Digital Library to search using the Library of Congress subject headings and other more sophisticated search resources, which enables me to find the long and authoritative chunks of information I'm looking for. Not everything on the web has subject-level metadata that is vastly better at identifying relevant content than mere word occurrences, but of course that's because most stuff on the web doesn't justify the additional effort to create it.

Isn't Google's fundamental relevance algorithm PageRank? This gives higher weight to inbound links from higher ranked sources. Ideally this would create a virtuous cycle of relevance weighting based on "authoritativeness". In practice, of course, there's a constant battle with those trying to game the system. Metadata and agreed upon semantics would allow search systems to be smarter. (I'm talking to you, Semantic Web.) However, as with the gaming of PageRank, metadata and semantics in the wild cannot always be trusted (or at least cannot make up the entirety of your relevance calculations).

Yes Justin, PageRank is fundamental for Google, so I didn't express myself clearly enough. My point is that any relevance ranking technique that looks at page-size fragments rather than works as a whole -- like book sized works - is going to underweight books in a ranked results list.

Excellently written article on Google. Certainly, PageRank is fundamental for Google but sometimes I don't understand that how do Google assign rank to page and which factors has been considered by Google for ranking. ID scanner software