Monday, May 07, 2007

What It's About 1: Links and Bias

In my last post I suggested that digital history isn't about computers, although it may have seemed more reasonable to think so in the mid-1980s. In fact, around that time the influential computer scientist Edsger Dijkstra made the provocative argument that even computer science isn't about computers. Referring to the subject as computer science, he wrote, "is like referring to surgery as 'knife science'..."

We now know that electronic technology has no more to contribute to computing than the physical equipments. We now know that programmable computer is no more and no less than an extremely handy device for realizing any conceivable mechanism without changing a single wire, and that the core challenge for computing science is hence a conceptual one, viz. what (abstract) mechanisms we can conceive without getting lost in the complexities of our own making. ... This discipline, which became known as Computing Science, emerged only when people started to look for what be common to the use of any computer in any application. By this abstraction, computing science immediately and clearly divorced itself from electronic engineering: the computing scientist could not care less about the specific technology that might be used to realize machines, be it electronics, optics, pneumatics, or magic.

To some extent, digital history inherits this indifference to underlying mechanism. We're better off to focus our attention instead on what the technology allows us to do.

Links, it is said, are the currency of the web. They make it possible to navigate from one context to another with a single click. For human users, this greatly lowers the transaction costs of juxtaposing two representations. The link is abstract enough to serve as means of navigation and able to subsume traditional scholarly activities like footnoting, citation, glossing and so on. Furthermore, extensive hyperlinking allows readers to follow nonlinear and branching paths through texts. So much is well known to humanists. Fewer seem to realize that links are constantly being navigated by a host of artificial users, colorfully known as spiders, bots or crawlers. A computer program downloads a webpage, extracts all of the links on it, and follows each in turn, downloading the new pages that it encounters along the way. Using tools like this, students of the internet can map the topology of subnetworks. Some pages serve as hubs, with millions of inbound links. Some are bridges that connect two network regions that are otherwise very sparsely interconnected. Done ceaselessly on a large enough scale, a dynamic and partial map of the internet emerges from spidering, and this serves as the basis for search engines.

Stop for a moment and think about search engines. Google handles more than ninety million search requests per day. For the vast majority of those searches, there will be far too many hits for the user to look at more than a tiny fraction of the results. Instead, he or she will concentrate on the top 10 or 20 hits. Google (and a few other companies like Yahoo! and MSN) are introducing biases into research results by ranking the hits. That's unavoidable, and historians, at least, take bias for granted. It is something to be thought about, not something that can be eliminated. I would argue, however, that search engine result ranking is the single most pervasive form of bias that has ever existed. When Google says that their mission "is to organize the world's information and make it universally accessible and useful," they're not kidding. Do you know how search engines work? Can you afford not to?

The Programming Historian

Are you interested in learning how to program? Check out The Programming Historian, an open-access introduction to Python programming for working historians (and other humanists) with little previous experience.