Thursday, 28 March 2013

The academic community I’m most heavily
involved in – Digital Humanities – are fairly invested in twitter. At all times
of the day there are major figures, students, and newbies in the field on there, just hanging out,
debating topics, forwarding links to events, job postings, interesting research
and cool things they have stumbled upon. People have studied this – graphing and charting the discussions, especially around the DH conference,
and heck, even I have co-authored a paper on the subject.

I’m currently working on a book/project
called Defining Digital Humanities and I
thought, wouldn’t it be fun to get all – and I mean all – the tweets that
contain the hashtag #Digitalhumanities – what fun could be had charting the
growth of the discipline, the geolocation of tweets, the networks that exist,
the sentiments surrounding it – etc etc. Now, hindsight is a grand thing- I should have thought to start scraping
these back in 2006 – but surely it must be possible to get access to this for
research? So I asked.

The first approach was to Gnip – who have “full historical access to the twitter firehose available
exclusively”.They were really very
helpful, and we got into a conversation about my needs, their licensing, and –
of course – costs. The upshot is that if you want a hashtag, you can get it for
a price, with the text delivered in JSON format. I was quoted between $15,000
and $25,000 for the full historical set (depending on the exact volume of the
data, they are now looking into it to give me the final figure - I and they dont yet know how many tweets there are containing this hashtag).

The second place I asked was Datasift– “the leading platform for building
applications with insights derived from the most popular social networks and
news sources”.They do have access to the historical twitter firehose, but they don’t
do one off searches, and licensing will start at $3000 per month to get access
to it (on a yearly contract). They will be launching a pay as you go service at some point, they tell
me. By the way, you can get $10 worth of
free credit for processing if you sign up and play around with some current
searches: I set a set for #digitalhumanities and I had run out of credit within
a few hours. (I find the user interface very obfuscating– I’m still wrangling with it to see what
that data actually is!).

Now, these costs are very little compared
to the costs to access the full firehose and lets face it – a free service like twitter has to make its money somewhere.
These were not vexatious enquiries: I’d really like to do this study. But now I
have to find $25k down the back of the sofa to get access to this data (and
incidentally, if I do, I wont be allowed to quote it, only to show the stats
that emerge from the analysis).$25k is
a fair whack of money in academia-land. It will also take around 6 months (at
least) to write it into a grant proposal to raise the money – and how to persuade
academic funders that buying this dataset is good use of their money? Frankly,
I’m not sure that will fly in the arts and humanities, where complete grant
costings can come under £100k for a one year project.

Thinking caps are now on to see how we can
get funding put together to get access to the data of the community I –
goddamit – helped (in some small way) to create. I love twitter with a passion
and it continues to inform and aid my teaching and research. But when we invest
so much in a free service, we are selling ourselves. It’s interesting to see
how much #digitalhumanities is “worth” to others. Anyone got a free $25k?

About Melissa Terras

I'm the Professor of Digital Humanities in the Department of Information Studies, University College London, and Director of UCL Centre for Digital Humanities. I teach Digitisation, and my research focuses on the use of computational techniques to enable research in the humanities that would otherwise be impossible. My UCL webpage contains more information about publications and research projects. I also hang out at @melissaterras. This was my personal blog, and everything I said here was in a personal capacity - you can find my new blog over at melissaterras.org. I'm preserving this content to prevent bit rot, but its all replicated over the road, too - hope to see you there.