Monday, August 27, 2007

I've been invited to join the crack team of bloggers at Cliopatria, so I will be cross-posting there and at Digital History Hacks from time-to-time. I'm excited by the opportunity to develop a series of posts on a topic of general interest to historians, while keeping enough technical content to satisfy my regular readers. So... let's build a time machine!

At some point in the early nineties I copied down a quote by Loren Eiseley in a commonplace book:

A man who has once looked with the archaeological eye will never quite see normally again. He will be wounded by what other men call trifles. It is possible to refine the sense of time until an old shoe in the bunch of grass or a pile of nineteenth-century beer bottles in an abandoned mining town tolls in one's head like a hall clock. This is the price one pays for learning to read time from surfaces other than an illuminated dial. It is the melancholy secret of the artifact, the humanly touched thing. The Night Country 1971:81.

I made a note of the source, but not how I came upon it. I know I wasn't reading Eiseley's work because I used to keep lists of the books that I read. At the time I was studying linguistics and cognitive science, and in the early summer of 1994 I dipped into ecological anthropology. I assume that I came across the quote then. Now I don't really remember the context as clearly as it sounds. I'm making inferences from my old notebooks and from Usenet posts that have been archived online for 15 years. Reading through those old posts reminds me of what I was doing at the time, although I remember being quite a bit cooler than some of my posts make me sound. I wish that that were my own melancholy secret, but at some point in the 1990s I realized that everything that I had ever typed into a computer was going to be saved forever and eventually made available to everyone.

The Eiseley quote stuck with me, and occasionally I would imagine what it would be like to have an 'archaeological eye.' Being given more to science fiction than fantasy, I tended to imagine a mechanism or instrument or device of some sort, rather than a magical object like a crystal ball. Now at this point I should probably stop and reassure you that I know that it may well be impossible to build a time machine in general, and that it is certainly impossible for me to build one. But I think it can sometimes be quite productive to start with something that you know is impossible, and think through some of the implications anyway. As a genre, fiction is ideally suited to this kind of gedankenexperiment; academic monographs less so. Blogs lie somewhere in between. As my fellow Cliopatrian Timothy Burke once wrote, a blog is an ideal "place to publish small writings, odd writings, leftover writings, lazy speculations, half-formed hypotheses." Plus, time machines are a heck of a lot of fun.

When most people think of a time machine, I suspect they probably imagine something like the H. G. Wells version: jump in, set the dial to whenever, hit a button and you are there. This kind of time machine allows (or requires) you to alter the course of events. Sometimes the results are tragic. In the classic Ray Bradbury story "A Sound of Thunder," one of the characters steps on a prehistoric butterfly and changes the future decidedly for the worse. Sometimes the results are comic, as in Connie Willis's re-take of Jerome K. Jerome. A skeptic might point out that if this kind of time travel were ever going to be possible, we'd already be surrounded by people whizzing back from the future to take our fresh water or oxygen, or buy stock in Google, or exhort their younger selves to study harder, or whatever. For historians, the real problem with being able to alter the past is that it would seem to allow for Bill & Ted-style rewriting on a grand scale, and thus make history utterly pointless. The mutability of history, after all, crucially depends on the immutability of the past.

In fact, physicists are split on the possibility of time travel. Some of those who think time travel might be possible suggest that there could be some law of physics that prevents the creation of weird causal loops--you know, the kind where you go back in time to become your own great-great-grandfather or -mother. Stephen Hawking, for example, postulates a "chronology protection conjecture." (For more, see the article by Paul Davies in Scientific American or his subsequent book.) So when I think of an 'archeological eye' I usually imagine something more voyeuristic: the ability to see or hear or in some way measure the events of the past without affecting the outcome.

Years later, let's say around Y2K, I was studying history. Reading Carlo Ginzburg's essay "Clues" reminded me of the Eiseley quote once again. Wouldn't it be cool to write a history based on virtuoso readings of material evidence? (Like Ginzburg, I read a lot of Sherlock Holmes as a kid.) Unfortunately, the only thing that I was arguably a virtuoso at reading was books, and even that was a stretch. Fortunately I was also reading the work of New Institutional Economists at the time. My head was full of ideas of information costs and transaction costs. Since it costs something to learn something, we can never know very much. I had about the same chance of learning to read old shoes or nineteenth-century beer bottles as I did of learning to read sheet music: fairly low. Choosing to specialize in reading one kind of material evidence would preclude learning to read an almost infinite number of other kinds of traces.

What to do? The key word is 'specialize'. As with other kinds of work, there is a division of interpretive labor. In order to make use of material trace evidence, you don't necessarily need to be able to read it yourself, you simply need to be able to find someone who can. With the traditional tools of scholarship it would have been very difficult to assemble a synoptic view of other people's reconstructions of the past from physical evidence. The emergence of search engines like Google drastically lowered those information costs, however. If you type interpret "wear marks" into Google, you will find a reference to a 1958 paper in the British Chiropody Journal on using shoe wear marks to diagnose foot troubles. You'll find a white paper on how to use scattered light to assess surface and bulk defects in various materials, a paper on the use-wear of stone tools, and so on. You'll find, in other words, a world of chiropodists, materials scientists, forensic scientists, engineers, archaeologists and thousands of other kinds of specialists busy reconstructing the past from its material traces. These are people in search of usable past. They care about past events because they have consequences in the present, and the only way they can access that past is by looking for its indexical signs. These experts don't always agree with one another; the mutability of history also depends on the fact that learning is costly. But since our environment is comprised entirely of survivals from the past, it is a kind of time machine, constantly transporting everything from some past into the present. It is one kind of time machine that is worth having... even if it does seem to work in one direction only and is remarkably difficult to use. (For more on the idea of the environment as an archive of material traces see my new book The Archive of Place.)

Saturday, August 18, 2007

Perpetual analytics is the process of comparing each new item of incoming information to the whole collection at the moment that it is received. IBM scientist Jeff Jonas writes, "there is an ocean of historical data and it is raining, which is to say new data keeps being introduced ... Think of [perpetual analytics] like 'directing the rain drops' as they fall into the ocean – placing each drop in the right place and measuring the ripples (i.e., finding relationships and relevance to the historical knowledge). Discovery is made during ingestion and relevant insight is published at that magical moment." Jonas contrasts this approach with the more traditional process of creating isolated, specialized databases to hold different kinds of information. Over time, these databases tend to become 'silos': many interesting things might be discovered if the information within them could be integrated, but the information costs are too high to do so.

The most powerful implementation of this idea (not to mention the most difficult) would be general-purpose mining at the scale of the internet. I'll leave that for Google or IBM. Instead, I'm going to describe a special-purpose system that operates in a very restricted and small domain.

Imagine browsing through a collection of online primary sources that may be relevant for your research. They could be diary entries, historic newspaper articles or parliamentary records. As you navigate to each new page, a set of links appears in the right sidebar, the way that sponsored advertisements appear in Google search results. Instead of being ads, however, these are links to related primary and secondary sources. If you are reading a letter, for example, there may be links in the sidebar to biographies of the author, recipient or people mentioned in the text. There may be links to other letters written by these people, or to other letters written at the same time and place. If some known event is being described, there may be links to historical accounts of that event. And so on. If you click on one of these sidebar links, a new tab opens in your browser with that source displayed in it, and with links to other sources that are related to it. The sidebar provides ambient information that may be useful without distracting you from the task at hand.

This recommendation system has two very useful features: it is generated automatically and it gets smarter as you use it. Here's what is going on behind the scenes. When you browse to a page, the system stores a copy of the text in a database. If it is the first page you've ever looked at, nothing else happens. When you go to the second page, however, it stores a copy of the text, then uses the normalized compression distance (NCD) to determine how similar the two pages are. (For more on the NCD, see my earlier posts.) As you browse to each new page, a copy is added to the database, and the NCD is calculated for that page and every other that one you've already visited. The sidebar displays links to the closest ones already in the database.

As described so far, this system is able to cluster your own reading, always showing you links to the most relevant stuff that you've already seen. In order to be really useful, you can seed the database with source collections that are likely to be relevant but are too large to be read systematically. For example, if you are working in a particular national and temporal context, you might add all of the entries from a dictionary of historical biography. If you are working in a particular place, you might add complete runs of local newspapers. For specific fields you could add runs of scholarly journals. For groups of people you could add correspondence and diaries.

Furthermore, the system scales up powerfully for collaborative research if the database is shared by everyone working on a particular subject. As each person finds something of interest, it immediately becomes available for recommendation to any of the others, depending on what they are looking at. Built on top of a server-backed version of Zotero, this tool provides one path to leveraging the power of collective intelligences.

The Programming Historian

Are you interested in learning how to program? Check out The Programming Historian, an open-access introduction to Python programming for working historians (and other humanists) with little previous experience.