Twitter’s historical content archive is now open to the public: here’s all you need to know

Last week, Twitter began to allow its users to search through every tweet sent publicly since the popular social platform was launched in 2006.

Described as ‘the SMS of the internet,’ the network has had an impressive history, rise and impact since its inception. It has become the platform of choice for celebrities and has, as Rupert Murdoch put it: ‘helped change the news biz and the world.’

The growth of the platform since the first tweet by its founder, Jack Dorsey (see below) has been constant and the sheer number of messages fired off into the Twittosphere is almost unfathomable.

Despite complications and persistent obstacles, it was always an objective for the company to index its massive archive of user-generated content and make it searchable for its massive user base.

Our long-standing goal has been to let people search through every Tweet ever published.

It should be noted that this amounts to roughly half a trillion messages, many of which contain links and rich media.

What is going to change?

Before this update, Twitter allowed discovery of ‘relatively fresh user-generated content via an inverted index containing about a week’s worth of recent tweets.’

There was a certain amount of dissatisfaction regarding the inability to access old tweets.

On top of this, the search function itself was quite messy and difficult to use effectively. Although we can’t know the exact workings of Twitter’s algorithms, it was clear that a lot of emphasis was on ‘nearness in time’ and topics that were currently ‘hot’ and ‘trending.’ As project leader, Yi Zhuang put it:

This was, of course, to consolidate its authority as the world’s go-to real-time news source. It is in this area that Twitter dominates and has always dominated, despite continual pressure from its larger and more powerful competitors.

Having access to this wealth of international opinion and discussion is huge! As a simple demonstration, here are the tweets made by Sohaib Athar on 01 May – 02 may 2011. For anyone who is unaware, Athar is the unsuspecting IT technician who provided accidental digital coverage of US forces storming Bin Laden’s hideout over those dates. Now, we can revisit the action.

How can we use this?

There is a range of complex but very useful options for searching through historic tweets. The snapshot below provides an indication. I have highlighted some of the most useful functions.

The options allow you to focus on location and/or time period. You can filter by sentiment. You can even recover every tweet from a particular set of accounts that mention a particular brand or person. The opportunities for marketing debate and study are endless.

A primary use for this new search function, highlighted by Twitter and many third-party commentators, is its use for exploring public discussion and the analysis of events and how the public responded.

As we know, the World Cup 2014 was one of the most social events of recent times. You can now revisit the public’s coverage with veritable ease.

It would also be easy to examine trends of entire television and sport seasons, conferences, industry discussions, places and businesses.

For a holiday company, for instance, it may be useful to go back and compare the public’s attitudes over different years or in different locations.

Embedded below is a scroll of Tweets mentioning @Butlins this summer, including negative sentiment.

These can be downloaded in a spreadsheet where they can be analysed coherently and compared to other results.

Another huge benefit of this function is to be able to revive old hashtag conversations. Scrolling through old-time exchanges on topics such as #Ferguson (although this discussion has surely been vehemently reignited), #ScotlandDecides or #Election2012 is likely to unearth some interesting and valuable opinions and information.

Moving forward

While the search is already visibly more succinct, accurate and easy to use, Twitter has explained that this process is ongoing and its search engine will be continuously refined over the coming months.

Now, it is worth pointing out that there were a few other ways to explore historical tweets. As well as at the Library of Congress and MIT’s new Laboratory for Social Machines (which received a $10 million investment from Twitter in October), there were a few analytics tools that provided this function.

The most popular of these include Gnip and Topsy. Towards the end of 2013, Topsy was acquired by Apple. Earlier this year, Twitter acquired Gnip. Perhaps the need to provide a better service than their titan rival in tech sped up Twitter’s search-development process? An investment that previously they didn’t seem overly concerned about.

In any case, Twitter’s updated search engine is a much-needed development upon these third-party tools. It is more streamlined, extensive and efficient and is only set to improve.

We will keep our eyes on the progress of the update over the coming months and update this post with any relevant information.