Posts about our text and social media analysis work and latest news on GATE (http://gate.ac.uk) - our open source text and social media analysis platform. Also posts about the PHEME project (http://pheme.eu) and our work on automatic detection of rumours in social media. Lately also general musings about fake news, misinformation, and online propaganda.

Tuesday, 7 August 2012

GATE is Getting Sentimental about Social Media

GATE is Getting Sentimental about Social Media

Over the past two years, Diana Maynard, myself, and other colleagues in the GATE team have been working on a number of GATE-based sentiment analysis and opinion mining tools, specifically optimised for Twitter, blogs, comments, and other kinds of social media posts. The work has been part of the Arcomem and TrendMiner EC-funded projects, as well as my EPSRC fellowship on mining and summarisation of social media (grant EP/I004327/1).

Speaking from experience, doing opinion mining on social media is nothing but challenging. And in this paper Diana, Dominic, and I have tried to explain why. In a nutshell:

Most NLP tools do not come with a swear word plugin. As part of her work on the Arcomem project, Diana had fun collecting a suitable training corpus and a swear word list for sentiment detection.

"It's all Greek to me": less than 50% of all tweets are in English. Thanks to the plethora of GATE multilingual plugins, building a basic NLP pipeline wasn't as bad as it could have been.

Identifying relevant posts: there's more chaff than wheat out there, especially on Twitter.

Twts r noizy: Normalisation and spelling correction are essential. It turns out that the perfect way to collect a training corpus of tweets for normalisation purposes is to search for Justin Bieber.

Opinion target identification in tweets is...ahem...even more challenging than in longer texts (not that we have fully solved it there either).