This blog is a space for me to rant in that most seventeenth-century sense of the word; and to cut and paste the ideas and comments that don't seem to fit in more traditional forms of academic publication.

Friday, 29 May 2015

The following post is derived from a short talk I gave at a doctoral training event at the British Library in May 2015, focused on using the UK Web Archive. It was written with PhD students in mind, but really forms a meditation on the opportunities created when we are working with web sites rather than print. While lightly edited, the text retains the ticks and repetitions of public presentation.

My office c.1984

I normally work on properly dead
people of the sort that do not really appear in the UK Web Archive – most of
them eighteenth-century beggars and criminals.
And in many respects the object of study for people like me –
interlocutors of the long dead - has not
changed that much in the last twenty years.
For most of us, the ‘object of study’ remains text. Of course the ‘digital’ and the online has
changed the nature of that text. How we
find things – the conundrums of search – shape the questions we
ask.
And a series of new conundrums have been added to all the old ones –
does, for instance, ‘big data’ and new forms of visualisation, imply a new
‘open eyed’ interrogation of data? Are
we being subtly encouraged to abandon older social science ‘models’, for
something new? And if we are, should
these new approaches take the form of ‘scientific’ interrogation, looking for
‘natural’ patterns – following the lead of the Culturomics movement; or perhaps
take the form of a re-engagement with the longue durée– in
answer to the pleas of the History Manifesto. Or perhaps we should be seeking a return to
‘close reading’ combined with a radical contextualisation - looking at the
individual word, person, and thing – in its wider context, preserving
focus across the spectrum.

And of course, the online and the
digital also raises issues about history writing as a genre and form of
publication. Open access, linked data, open data, the
'crisis' of the monograph, and the opportunities of multi-modal forms of
publication, all challenge us to think again about the kind of writing we do,
as a literary form. Why not do your PhD as a graphic novel? Why
not insist on publishing the research data with your literary over-lay? Why not do something different? Why not self-publish?

These are conundrums all – but
conundrums largely of the ‘textual humanities’.

Ironically, all these conundrums
have not had much effect on the academy and the kind of scholarship the academy
values. The world of academic writing is
largely, and boringly, the same as it was thirty years ago. How we do it has changed, but what it looks
like feels very familiar.

But the born digital is
different. Arguably, the sorts of things
I do, history writing focused on the properly dead, looks ‘conservative’ because it
necessarily engages with the categories of knowing that dominated the nineteenth
and twentieth centuries – these were centuries of text, organised into
libraries of books, and commentated on by cadres of increasingly professional
historians. The born digital – and most
importantly the UK web archive – is just different. It sings to a different tune, and demands
different questions – and if anywhere is going to change practise, it should be
here.

Somewhat to my frustration, I
don’t work on the web as an ‘object of study’ – and therefore feel uncertain about what it can
answer and how its form is shaping the conversation; but I did want to suggest
that the web itself and more particularly the UK Web Archive provides an
opportunity to re-think what is possible, and to rethink what it is we are
asking; how we might ask it, and to what purpose.

And I suppose the way I want to frame
this is to suggest that the web itself brings on to a single screen, a series
of forms of data that can be subject to lots of different forms of analysis. A few years ago, when APIs were first
being advocated as a component of web design, the comment that really struck
me, was that the web itself is a form of API, and that by extension the Web Archive
is subject to the same kind of ‘re-imagination’ and re-purposing that an API
allows for a single site or source.

As a result, you can – if you want –
treat a web page as simple text – and apply all the tools of distant reading of text - that
wonderful sense that millions of words can be consumed in a single gulp. You can apply ‘topic modelling’, and Latent
Semantic Analysis; or Word Frequency/Inverse Document Frequency measures. Or, even more simply; you can count words,
and look for outliers – stare hard at the word on the web!

But you can also go well beyond
this. In performance art, in geography
and archaeology, in music and linguistics, new forms of reading are emerging
with each passing year that seem to me to significantly challenge our sense of
the ‘object of study’ – both traditional text and web page.
In part, this is simply a reflection of the fact that all our senses and
measures are suddenly open to new forms of analysis and representation. When
everything is digital – when all forms of stuff come to us down a single
pipeline - everything can be read in a
new way.

Consider for a moment the ‘LIVE’ project
from the Royal Veterinary College in London, and their ‘haptic simulator’. In this instance they have developed a full
scale ‘haptic’ representation of a cow in labour, facing a difficult birth,
which allows students to physically engage and experience the process of
manipulating a calf in situ. I haven’t
had a chance to try this, but I am told that it is a mind altering
experience. It suggests that reading can
be different; and should include the haptic - the feel and heft of a thing in
your hand. This is being coded for
millions of objects through 3d scanning; but we do not yet have an effective
way of incorporating that 3d text into how we read the past.

The same could be said of the aural - that
weird world of sound on which we continually impose the order of language,
music and meaning; but which is in fact a stream of sensations filtered through
place and culture.

Projects like the Virtual St Paul's Cross, which allows you to ‘hear’ John Donne’s sermons from the 1620s, from
different vantage points around the yard, changes how we imagine them, and
moves from ‘text’ to something much more complex and powerful. And begins to navigate that normally
unbridgeable space between text and the material world. And if you think about this in relation to
music and speech online – you end up with something different on a massive
scale.

One
of my current projects is to create a sound scape of the courtroom at the Old
Bailey - to re-create the aural experience of the defendant - what it felt like
to speak to power, and what it felt like to have power spoken at you from the
bench. And in turn, to use that knowledge to assess who was more effective in
their dealings with the court, and whether, having a bit of shirt to you, for
instance, effected your experience of transportation or imprisonment. And the point of the project is to simply add
a few more variables to the ones we can securely derive from text.

It is an attempt to add just a couple of
more columns to a spreadsheet of almost infinite categories of knowing. And you could keep going – weather, sunlight,
temperature, the presence of the smells and reeks of other bodies. Ever more layers to the sense of place. In part, this is what the gaming industries
have been doing from the beginning, but it also becomes possible to turn that
creativity on its head, and make it serve a different purpose.

In the work of people such as Ian Gregory, we can see the beginnings of new ways of reading both the landscape,
and the textual leavings of dead. Bob
Shoemaker, Matthew Davies and I (with a lot of other people) tried to do
something similar with Old Bailey material, and the geography of London in the
Locating London’s Past project.

This map is simply colours blue, red and
yellow mapped against brown and green. I
have absolutely no idea what this mapping actually means, but it did force me
to think differently about the feel and experience of the city. And I want to be able to do the same for all
the text captured in the UK domain name.

All of which is to state the obvious. There are lots of new readings that change
how we connect with historical evidence – whether that is text, or something
more interesting. In creating new digital forms of inherited
culture - the stuff of the dead - we naturally innovate, and naturally enough,
discover ever changing readings. But the
Web Archive, challenges us to do a lot more; and to begin to unpick what
you might start pulling together from this near infinite archive.

In other
words, the tools of text are there, and arguably moving in the right direction,
but there are several more dimensions we can exploit when the object of study
is itself an encoding.

Each web
page, for instance, embodies a dozen different forms. Text is obvious, but it is important to
remember that each component of the text – each word and letter, on a web page -
is itself a complex composite. What
happens when you divide text by font or font size; weight, colour, kerning,
formatting etc. By location - in the
header, or the body, or wherever the CSS sends it; or more subtly by where it
appears to a users’ eye - in the middle of a line – or at the end.

Suddenly, to
all the forms of analysis we have associated with ‘distant reading’ there are
five or six further columns in the spread sheet – five or six new variables to
investigate in that ‘big data’ eye-opened sort of way.

And that is
just the text. The page itself is both a
single image, and a collection of them – each with their own properties. And one of the great things that is coming
out of image research is that we can begin to automate the process of analysing
those screens as ‘images’. Colour,
layout, face recognition etc. Each page,
is suddenly ten images in one – all available as a new variable; a new column
in the spreadsheet of analysis. And, of
course, the same could be said of embedded audio and video.

And all of
that is before we even look under the bonnet.
The code, the links, the meta data for each page – in part we can think
of these as just another iteration of the text; but more imaginatively, we can
think about it as more variables in the mix.

But, of
course, that in itself miss-understands the web and the Web Archive. The commonplace metaphor I have been using up
till now is of a ‘page’ – and is the intellectual equivalent of skeumorphism - relying
on material world metaphors to understand the online.

But these
aren’t pages at all, they are collections of code and data that generate in to
an experience in real time. They do not
exist until they are used - if a website in the forest is never accessed, it does
not exists. The web archive therefore is
not an archive of ‘objects’ in the traditional sense, but a snapshot from a
moving film of possibilities. At its
most abstract, what the UK Web Archive has done, is spirit in to being the very
object it seeks to capture – and of course, we all know that in doing so, the
capturing itself changes the object. Schrödinger's cat may be alive or dead, but its box is
definitely open, and we have visited our observations upon its content.

So to add
to all the layers of stuff that can fill your spreadsheet, there also needs to
be columns for time and use; re-use and republication. And all this is before we seek to change the
metaphor and talk about networks of connections, instead of pages on a website.

Where I end
up is seriously jealous of the possibilities; and seriously wondering what the
‘object of study’ might be. In the
nature of an archives, the UK Web Archive imagines itself as an ‘object of
study’; created in the service of an imaginary scholar. The question it raises is how do we turn
something we really can’t understand, cannot really capture as an object of
study, to serious purpose? How do we
think at one and the same time of the web as alive and dead, as code, text, and
image – all in dynamic conversation one with the other. And even if we can hold all that at once,
what is it are we asking?