The decaying web

For UK readers, here’s my most recent BBC Future column, exploring the vanishing legacy of social media – and what it might mean for history. I got a number of interesting reactions to this column, including some fascinating feedback from the Twitter user mentioned in the first paragraph via Facebook on 1st October.

On January 28 2011, three days into the fierce protests that would eventually oust the Egyptian president Hosni Mubarak, a Twitter user called Farrah posted a link to a picture that supposedly showed an armed man as he ran on a “rooftop during clashes between police and protesters in Suez”. I say supposedly, because both the tweet and the picture it linked to no longer exist. Instead they have been replaced with error messages that claim the message – and its contents – “doesn’t exist”.

Few things are more explicitly ephemeral than a Tweet. Yet it’s precisely this kind of ephemeral communication – a comment, a status update, sharing or disseminating a piece of media – that lies at the heart of much of modern history as it unfolds. It’s also a vital contemporary historical record that, unless we’re careful, we risk losing almost before we’ve been able to gauge its importance.

Consider a study published this September by Hany M SalahEldeen and Michael L Nelson, two computer scientists at Old Dominion University. Snappily titled “Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?”, the paper took six seminal news events from the last few years – the H1N1 virus outbreak, Michael Jackson’s death, the Iranian elections and protests, Barack Obama’s Nobel Peace Prize, the Egyptian revolution, and the Syrian uprising – and established a representative sample of tweets from Twitter’s entire corpus discussing each event specifically.

It then analysed the resources being linked to by these tweets, and whether these resources were still accessible, had been preserved in a digital archive, or had ceased to exist. The findings were striking: one year after an event, on average, about 11% of the online content referenced by social media had been lost and just 20% archived. What’s equally striking, moreover, is the steady continuation of this trend over time. After two and a half years, 27% had been lost and 41% archived.

This is just one investigation, and a preliminary one at that. The figures, though, suggest a clear linear trend: the loss of just over 10 per cent of the resources shared via social media each year, even when archiving is taken into account, or around 0.02% of this content lost every day.

This isn’t the same thing as Tweets themselves vanishing. For those wishing to analyze exhaustively trends within social media utterances themselves, services like Gnip – which, for a fee, promises “complete and comprehensive access to every publicly available Tweet dating back to the very first Tweet from March 21, 2006” – offer an unprecedented “fire hose” of data, from which marketing and research firms are already gratefully guzzling.

What’s most vulnerable, rather, is the network of living connections into which social media is a window: the nexus of sources, resources, sounds, images and updates that together constitute the stuff of many millions of people’s daily experience. One commercial firm may well be able to sell you every extant public tweet ever sent – and another may do the same for other social media services. As work like SalahEldeen and Nelson’s study suggests, however, preserving these individual threads does little by itself to stop the tapestry of present history unravelling.

It’s a phenomenon that, in a different form, has been much on my mind recently, thanks to my work on a new book delving into the history of many digital developments since the end of the Second World War.

As you might expect, the internet itself is an endless treasure trove for such research. At the same time, though – and especially when it comes to the pre-web contents of early internet technologies such as Usenet or Bulletin Board Systems – all that remains of many seminal exchanges or ideas is often the copy-paste of a copy-paste of a copy-paste. Less than three decades after many discussions took place, both the “original” source and the technological platform on which it existed are not only impossible to find, but literally non-existent.

Online, in fact, it’s far easier for me to trace the development of many key ideas from the 1700s than it is from the last half-century. When it comes to the coining of 18th-century words, for example, copies of most paper books have simply sat in libraries ever since publication, waiting to be scanned and released into new digital life. Today, by contrast, much of the key digital data that authors and historians need if they are fully to unpick present intricacies – from the origins of words and ideas to political debates or even revolutions – is either locked away or lost within a few years of its creation.

At the heart of this lies what you might call the paradox of ephemeral communications. Their instantaneous, insubstantial ease is perfect for sharing and debating the most important questions of our time. But it also breeds a newly knotty historical problem – because all this sharing and debate mean precious little, in the long term, if you don’t also know what people are talking about.

With not only diaries and letters but even the relative permanence of email starting to look like something from the last century, it’s a problem that is only going to get more acute. There’s much to celebrate in the power and inclusiveness of new media. Historians researching early 21st-century life from the year 2312, however, will have their work cut out for them – and find that their chances of success depend disproportionately upon those private companies who own so much contemporary social history.

Our descendants will surely be grateful for a record that reflects more than marketable data and consumer preferences. As to preservation, though, the problem may be intractable. Between private profits, the privacy of personal histories and our hunger for perpetual renewal, “history” itself may be a concept ripe for rethinking: not so much the objective sifting of sources as a living thing, perpetually remade across networks for which there’s no time but the present.