Abstract

In 2013, the murder of Drummer Lee Rigby in Woolwich, UK led to an extensive public social media reaction. Given the extreme terrorist motive and public nature of the actions it was feasible that the public response could include written expressions of hateful and antagonistic sentiment towards a particular race, ethnicity and religion, which can be interpreted as ‘hate speech’. This provided motivation to study the spread of hate speech on Twitter following such a widespread and emotive event. In this paper we present a supervised machine learning text classifier, trained and tested to distinguish between hateful and/or antagonistic responses with a focus on race, ethnicity or religion; and more general responses. We used human annotated data collected from Twitter in the immediate aftermath of Lee Rigby’s murder to train and test the classifier. As “Big Data” is a growing topic of study, and its use is in policy and decision making is being constantly debated at present, we discuss the use of supervised machine learning tools to classify a sample of “Big Data”, and how the results can be interpreted for use in policy and decision making. The results of the classifier are optimal using a combination of probabilistic, rule-based and spatial based classifiers with a voted ensemble meta-classifier. We achieve an overall F-measure of 0.95 using features derived from the content of each tweet, including syntactic dependencies between terms to recognise “othering” terms, incitement to respond with antagonistic action, and claims of well founded or justified discrimination against social groups. We then demonstrate how the results of the classifier can be robustly utilized in a statistical model used to forecast the likely spread of hate speech in a sample of Twitter data.