Abstract

The rapid and widespread uptake of social media platforms such as Twitter, Facebook and YouTube has created new ways for people to interact and to share information. This brings both benefits and risks for civil society and new challenges for agencies responsible for ensuring the boundaries of acceptable behaviour are not crossed and, if they are, that perpetrators are brought to justice. The proliferation of so-called 'hate speech' in social media is an area of growing concern, as recent high profile events confirm. The most senior prosecutor in England and Wales recently acknowledged the harm that can be caused by hate speech on social media and explained that "banter, jokes and offensive comment are commonplace and often spontaneous" and "communications intended for a few may reach millions" (BBC 2012). For the social sciences, the migration of hate speech to social media platforms affords new opportunities to study hateful and antagonistic behaviours, to understand the impact of social media and to identify ways in which agencies can respond more effectively to its threats and consequences. This project aims to study the 'social media ecosystem' to better understand how the complex combination of user behaviours, global communication networks, and flows of information interact to promote hateful and socially disruptive content. The main deliverable of the project will be a computational tool, informed by social science knowledge, that will allow users to forecast the spread of hateful content over digital networks, providing an opportunity for intervention before such content 'goes viral' potentially causing harm to individuals, minority groups and communities.

The project includes three inter-related workpackages that will deliver the metrics that will inform the development of the computational tool. Firstly, content-based metrics will be derived from an analysis of hateful and antagonistic social media communications, in terms of linguistic categories, sentiment and tension. This will enable the construction of a typology of hateful and antagonistic social media communication. Secondly, network-based metrics will be derived from a profiling of the characteristics of digital networks through which hateful and antagonistic content is propagated across boundaries. Lastly, search-term metrics will be derived from an exploration of other sources of open data (such as the new Google Trends) in tracing the spread of hateful and antagonistic content online. These search-term metrics will be used to add an additional layer of information onto the first two metrics in an attempt to triangulate results. Each of these workpackages will build on the previous in an iterative fashion to inform the main deliverable of a probabilistic model-based methodology and subsequent computational tool to forecast the spread of hateful and antagonistic content via social media networks. This tool will be integrated into an online platform for use by academics and statutory agencies (see letters of support).

Planned Impact

This work is intended to be of benefit to the social science community in the first instance. It will provide a probabilistic model-based methodology and subsequent computational tool to facilitate the interpretation of the propagation and reaction to hateful content in social media. The project will provide a case study that can help validate a probabilistic model and provide a basis for assessing its application to other social science research questions. This work will also be of interest to publicly and commercially supported computer scientists involved in the development of algorithms, analysis and visualisation in a social media context. We have already made contacts with international academic partners (University of Queensland, Australian National University, Princeton University, and University of Illinois, Urbana-Champaign) and have secured travel bursaries to visit and demonstrate COSMOS in early 2013. These relations will benefit the existing proposal in terms of disseminating findings and promoting the use of the probabilistic model-based methodology in relation to a variety of social science problems.

The main deliverable of a probabilistic model-based methodology and subsequent computational tool will also be of use to non-academic users in statutory agencies such as the police, and third sector organisations such as Stonewall. These agencies currently have limited capacity to understand and interpret how hateful and antagonistic content emerges, propagates and is reacted to in social media networks. Many of these organisations are involved in establishing local, regional and national practices and policies to combat the spread of hate and there is currently a need to shed light upon this under-researched area. We have been in contact with several government, statutory and voluntary agencies and all have responded with enthusiasm to the proposed project's aims and objectives (see letters of support). Several of them have also agreed to sit on our steering group for the project if funded (Welsh Government, Scottish Government, Home Office, Equality and Human Rights Commission, the Association of Chief Police Officers, Greater Manchester Police, National Policing Improvement Agency, and Office for National Statistics)

How will they benefit?

Academic users will benefit in two ways. Firstly, researchers involved in the area of hate crime will benefit from a greater understanding of the nature, manifestation and spread of hateful information flows online. Secondly, once validated the probabilistic model can be tailored to examine and interrogate alternative social science questions in areas such as health, economics, media studies and education.

The tool will allow non-academic users to further their understand of this growing phenomenon and to forecast the spread of hateful content through digital networks, providing an opportunity for intervention before such content 'goes viral' potentially causing harm to individuals, minority groups and communities.

To achieve this, the project will entail:

- Further development and merging of COSMOS and ASMC platforms, hosting the probabilistic tools produced from this project;- Further development of an online guide to include guidance on the social scientific interpretation of the outputs of the probabilistic model-based methodology for researchers across disciplines and non-academic users. This guide also highlights the methodological and ethical concerns with using such data in various contexts;- Publications produced for both the social sciences (focusing on how the tool can be used to support social science research) and computer science research (focusing on data analysis and visualisation of the data) communities;- Workshops for academic and non-academic audiences, organised at Cardiff University, to demonstrate how the tool can be used alongside existing platform tools and methods.

Background The rapid and widespread uptake of social media platforms such as Twitter, Facebook and YouTube has created new ways for people to interact and to share information. This brings both benefits and risks for civil society and new challenges for agencies responsible for ensuring the boundaries of acceptable and legal behaviour are not crossed and, if they are, that appropriate action is taken. In this respect, the proliferation of so-called 'hate speech' in social media is an area of growing concern, as recent high profile examples confirm. The most senior prosecutor in England and Wales recently acknowledged the harm that can be caused by hate speech on social media and explained that "banter, jokes and offensive comment are commonplace and often spontaneous" and "communications intended for a few may reach millions" (BBC, 2012). For the social sciences, the migration of hate speech to social media platforms affords new opportunities to study hateful and antagonistic behaviours, to understand the impact of social media and to identify ways in which agencies can respond more effectively to its threats and consequences.

The key substantive research questions of the project are:

a. Can we identify hateful and antagonistic social media content, as well as attempts to counter it, in terms of key events, linguistic characteristics, sentiment and tension? b. Can we profile hateful and antagonistic social media networks in relation to user behaviour and interaction, building on the previous question to develop a typology of users (e.g. antagonists, influencers, propagators, reactors etc.)? c. Can we triangulate the above analysis with other forms of open data, such as new Google Trends metrics to validate the propagation of hateful content into online environments beyond social networks? d. Can we utilise the data derived from the above questions to build a probabilistic modeling methodology using Bayesian Belief Networks that could forecast the emergence and evolution of information flows (Procter et al., 2013) within social media networks through which hate-related content is transmitted? e. Can the model and methodology inform the social scientific interpretation of how hateful content travels and is impeded online, drawing on social scientific concepts such as responsibilisation (Garland, 2001) and nodal governance (Shearing & Wood, 2007) as framing devices?

Underpinning the aims and research questions is a particular approach that the interdisciplinary team (Social Science and Computer Science) of applicants have developed over the past two years, which has been termed Collaborative Algorithmic Design (Edwards et al. 2013, Housley et al., 2014). This involves the combination of measurement; construct validation; and interpretation through an iterative process. For example, operationalising social science theoretical propositions through the practical design and codification of computational tools and methods, which in turn produce results that are subjected to further critical interpretation and refinement.

Significant new knowledge generated

• Identified five equality strands for inclusion in study (race, religion, disability, sexual orientation and gender) • Identified five corresponding 'hate-speech' antecedent trigger events that generated a significant amount of Twitter traffic. • Built an ensemble machine classifier to identify tweets containing hateful content in each equality strand (See IPP paper:http://ipp.oii.ox.ac.uk/2014/programme-2014/ track-a-harnessing-the-crowd/modelling-and-predict ion/pete-burnap-matthew-l-williams-hate) • Built models to predict the size and survival of hateful information flows • Found that online hate speech did not propagate beyond 48 hours following the Woolwich terror attack • Social factors, rather than content or temporal factors of the tweet explained the most variance in both size and survival dependent measures • An increase in 100 Google searches for the term 'Woolwich' increased the rate of retweets by a factor of 1.50 (50 per cent) • An increase in 100 news headlines about the event increased the rate of retweets by a factor of 1.05 (5 per cent) • Tweets containing positive sentiment were statistically more likely to propagate in terms of size and survival • Tweets continuing hashtags and URLs were also more likely to be retweeted • Tweets containing hateful content were less likely to contain URLs but more likely to contain hashtags • In the size model Far Right Political Agents were the least likely to be retweeted in volume (besides Other Agents) • Tweets emanating from far right political agents (e.g. BNP) were the most likely to survive longest 36-42 hours after the event, at which point they lost ground to political agents, news agents and other agents, whose information flows lasted the longest in the study window (14 days) • Information flows emanating from Police Agents outlast all other agents but the Far Right in the 36-42 hour window • In summary, information flows emanating from Police Agents following the terrorist event were most likely to be large and to be long-lasting (bar the Far Right) in the impact and inventory periods (Cohen 1972) following the terrorist attack, while information flows emanating from Far Right Political Agents were likely to be small in size, but the most long lasting in the same periods • The five corpora harvested contained evidence of counter-speech that has been analyzed using relevant Discourse Analytic methods in order to generate relevant 'actor' and 'activity' typologies for counter- hate speech. • The qualitatively generated typologies of counter hate-speech have been used to configure the human coding exercise that has informed the building of a counter- hate speech classifier for each equality strand. • Advanced empirically informed theoretical understanding of social media and hate speech e.g. the role of issue attention cycles and interactional chains on social media.

New or improved research methods or skills developed

• Developed a hate speech classifier using an ensemble of machine learning techniques (see IPP paper:http://ipp.oii.ox.ac.uk/2014/programme-2014/ track-a-harnessing-the-crowd/modelling-and-predict ion/pete-burnap-matthew-l-williams-hate). • Applied zero-truncated and zero-inflated negative binomial regression models to predict the size of information flow propagation following trigger antecedent events to hate speech on Twitter. • Applied Cox proportional hazards and Kaplan Meier estimation to determine the survival of information flows following trigger antecedent events to hate speech on Twitter. • A twitter 'thread' capture digital tool has been developed through this project and incorporated on the COSMOS platform. This was developed in order to capture twitter threads and exchanges that concerned hateful and antagonistic content. The COSMOS platform was released in September, 2014. • Applied computer assisted Discourse Analytic techniques to counter-hate speech Twitter threads. • The project has developed a mixed methods (Qualitative and Quantitative) approach to social media research. • We have pioneered and explored the use of crowdsourced human coding techniques for feature identification in ways that inform machine learning techniques and classifier development for sociologically informed analyses of social media information flows.

Important new research resources identified

• Identified Twitter data archive as a key source of information that can be re-purposed to serve the needs of the social science community in the pursuit of understanding contemporary social problems as part of our COSMOS programme.

Important new research questions opened up

One of the most important and new research questions opened up is the extent to which social media is self-regulating in the context of hate speech and other forms of 'digital wildfire'.

Particularly noteworthy new research networks, collaborations or partnerships, or combinations of these.

This project has directly led to a follow on project with partners at Oxford University entitled Digital Wildfire: (Mis)information flows, propagation and responsible governance? (ES/L013398/1). This project is funded by the RCUK Global Uncertainties Programme. Specifically we will be applying the techniques and the coded and collated data sets generated during the course of this project for the follow on digital wildfire study.

Increased research capability generated from training delivered in specialist skills

In addition to the new methods identified above this project has been central to the COSMOS programme for observing and analyzing social media communications and repurposing these for social research (Housley et al, 2014). The project has been central to driving the development and testing of the COSMOS platform and its suite of tools, helping prepare the way for the release last month (September, 2014) of COSMOS desktop. At the time of writing COSMOS Desktop has been downloaded by the social research community two hundred times and rising.

Exploitation Route

The work produced by this project has been communicated to key steering committee partners; including the Welsh Government and Association of Chief Police Officers. The project has also produced a number of papers in top social and computer science journals; including Social Network Analysis and Mining and Big Data and Society (see publications). A number of papers have also been delivered at academic institutions that include Oxford (Oxford Internet Institute) and Warwick Universities (Centre for Interdisciplinary Methodologies). One of the main ways in which impact from this project will be taken forward is through the development of the Twitter capture function and refinement of the Tension Analysis tools hosted on the recently released desktop COSMOS platform (September, 2014). The platform has been released to the UK social science community on a not-for profit license. It is being used to inform training in digital social research methods and social media analytics aswell as support research. This project was critical in supporting and refining the tools offered through the COSMOS platform. In addition, a key set of outputs from this project included the securing of a two year SAGE fellowship based at Cardiff University, School of Social Sciences and a number of follow on funded projects, in particular 'Detecting Tension and Cohesion in Local Communities with Social Media', funded by Airbus Group, and 'Digital Wildfire: (Mis)information flows, propagation and responsible governance', funded by ESRC (Global Uncertainties). This has ensured that the key findings, research methods and skills developed can be sustained and inform future work in this critical area of privately and publicly funded research. This project has been successful in making links between private and public funding. In conclusion, the project contributed to the development of algorithms and a version of the COSMOS platform being used for a new BBC Radio 5 Live programme where trending social media topics and items inform discussion and public engagement (http://www.bbc.co.uk/blogs/5live/posts/5-live-Hit-List).

Sectors

Aerospace, Defence and Marine,Agriculture, Food and Drink,Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Environment,Government, Democracy and Justice,Security and Diplomacy

The project has initiated a range of impact activities over the last twelve months alongside publication and 'follow on' grant capture. It is envisaged that these activities will be developed through the connected project funding that has been secured since the initiation of the research that builds on our findings, digital instruments, engagement and methodological innovation. The first wave of impact has included the following activities.
We have engaged with various governmental and civic agencies that include local government, law enforcement agencies, schools, national media and the commercial sector. This has been realised through attendance at a number of policy impact focused conferences (e.g. the WISERD conference on Civil Society held in Cardiff, 2015) and by reaching out directly to and engaging with key stakeholders. For example, members of the project team presented at a conference organised by the Wales Hate Crime Criminal Justice Board in Cardiff, during July, 2014. The presentation covered the development of digital tools hosted on the COSMOS platform that have been developed and refined in order to monitor community tension via social media streams; in this particular case social media streams generated by the micro-blogging site known as Twitter. This engagement with the Board is being followed by a joint national conference on 'Hate Crime and Bullying in the Age of Social Media: Current Perspectives, Practices and Solutions', organised with team members at Cardiff University, that will include a set of workshops on social media and hate speech, attended by a range of stakeholders in October, 2015. It will involve presenting findings, demonstrating relevant digital tools developed via this project and engaging with delegates from across the public and voluntary sector as well as key note representatives that will include Lesley Griffiths AM, Minister for Communities and Tackling Poverty, Nick Pickles, Head of Policy at Twitter UK, Paul Giannasi, UK Government Hate Crime Lead, Ministry of Justice, Claire Lilly, Head of Child Safety Online, NSPCC and Alun Michael, Police and Crime Commissioner for South Wales (https://www.eventbrite.co.uk/e/cynhadledd-genedlaethol-ar-droseddau-casineb-a-bwlio-national-conference-on-hate-crime-bullying-registration-17934935876).
A related form of impact activity that has already been conducted involved the presentation of findings from our project to the UK Government Hate Crime Seminar, London, during the course of March, 2015. As a consequence we have engaged with both devolved and UK administrations in relation to project findings and tools for scoping and understanding hate speech and social media.
In addition to this the project findings and tools supported further grant capture (including RCUK funding). Of significance, in terms of impact, was the funding secured from EADS UK and Welsh Government (£51,040) that developed our established tools and analytics for detecting tension and cohesion in local communities with social media and funding from HPC Wales and Fujistu for the development of high performance computing, scalability and big 'social' data (£25,000). This public/private funding supported a range of activities including postgraduate research projects that focused on the concept of sensing social and community tension through social media. It also helped to export our findings into an innovative policy and governance domain where the Welsh Government and commercial organisations are supporting the development of social and community policy instruments for the digital age.
In terms of research capacity building at postgraduate level the project has also helped support and secure 3 aligned Wales ESRC DTC studentships and a SAGE publications postdoctoral fellow. The PhD projects include research on Social Media and Freedom of Speech in the Digital Age, Re-imagining National Identity in the Digital Age and a study on Social Media and Mysogyny. The SAGE postdoctoral fellow has helped translate methodological innovation supported by the project onto a wider online methods resource for the social sciences. This will generate research capacity, interdisciplinary skills and methods training for the study of social cohesion and tension in the digital era in next generation UK social science researchers.
Finally, the development and refinement of the COSMOS platform that this project has directly supported has delivered significant impact. During the Autumn of 2014 the enhanced COSMOS platform (http://www.cs.cf.ac.uk/cosmos/) was launched for the UK social science and policy community. The project directly supported the refinement of the digital tools hosted on the platform and served to test run the analytics generated by these tools on a range of Hate Speech social media corpora that had been harvested and collected in relation to signal public events (e.g. the Woolwich Terrorist attack and the Paralympics) where 'offline' identity matters and social tension had been apparent and reported in traditional media outlets.
We have realized over 500 requests thus far for use of the COSMOS platform software and this has included academic, government, law enforcement and related agencies. Of note was the use of the COSMOS platform for a weekly BBC Radio 5 Live 'social media chart show' (http://www.bbc.co.uk/programmes/b04pvh8v) where the stories behind trending topics on Twitter were ranked and used to configure an innovative news delivery platform that disrupted traditional headlines and news generation. It also served to highlight the work of the project and COSMOS platform to a national UK wide audience. In August, 2015, members of the team were awarded ESRC institutional impact funding (Cardiff University) in order to embed the cyberhate analytics of open source social media into the Metropolitan Police Service; this impact funding will be used to translate the work of the project into a relevant policy and practice context and inform open source intelligence that can assist the police in their real-time decision making practices.
(http://www.cardiff.ac.uk/news/view/131898-detecting-crime-using-social-media). In addition to this members of the team were also awarded funding from the British Academy in July, 2015, in order to co-produce a post-16+ module for young people with selected schools, students and teachers that will focus on social media, provocative/antagonistic content and responsible online citizenship. This module will be used within selected schools for the enhancement of responsible citizenship skills in the digital age.
Looking to the future we will build on these engagement and impact activities. In particular, by applying our work and findings from the project (and it's legacy) in ways that support the policy priorities and requirements of law enforcement agencies, schools and local government in relation to online hate speech, identity, community relations, regulation and digital citizenship. We will also build on the project's legacy in order to inform, sustain and support capacity in innovative digital social research methods training and development for academic and policy communities in the UK.

First Year Of Impact

2014

Sector

Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Government, Democracy and Justice,Security and Diplomacy

Detecting Tension and Cohesion in Local Communities with Social Media, Funded by Airbus Group, £51,040

Amount

£51,040 (GBP)

Organisation

Airbus Group

Department

Airbus Operations

Sector

Private

Country

United Kingdom

Start

09/2014

End

08/2015

Title

The Collaborative Online Social Media Observatory (Platform for Twitter Analytics)

Description

http://www.cs.cf.ac.uk/cosmos/
The Collaborative Online Social Media Observatory (COSMOS) is an Economic and Social Research Council (ESRC) strategic "Big Data" investment that brings together social, computer, political, health, statistical and mathematical scientists to study the methodological, theoretical, empirical and technical dimensions of social media data in social and policy contexts. This empirical data science programme is complemented by a focus on the ethical impact of big social data and the development of new methodological tools and technical/data solutions for the UK academic and public sectors. Our £1.5M research programme has been funded by the Economic and Social Research Council, Engineering and Physical Sciences Research Council, Partnership for Conflict, Crime and Security Research Programme (Global Uncertainties Programme), Joint Information Systems Committee, Department of Health, Food Standards Agency, High Performance Computing Wales/Fujitsu, Welsh Government and Airbus Group.
As part of our programme of research we have developed the COSMOS software platform that reduces the technical and methodological barriers to accessing and analysing social media and other forms of open digital data. The COSMOS platform is set apart from all other existing social media analysis software due to its novelty in five areas: i) it is supported and informed by rigorous methodological and technical research conducted by an interdisciplinary team (computer and social scientists) that informs users in their analysis; ii) the platform allows for the linking of multiple digital data sources (social media, other digital, curated and administrative); iii) it integrates a number of data analysis tools using a workflow model (e.g. sentiment analysis can be followed by a social network analysis, which can then be geo-located); iv) its analysis algorithms are open, transparent, inspectable and refreshable/adaptable by users; and v) users do not need any knowledge of programming. The platform is free to academic and public sector users conducting not-for-profit research.

Type Of Material

Improvements to research infrastructure

Year Produced

2014

Provided To Others?

Yes

Impact

We have realized over 500 requests thus far the COSMOS platform software and this has included academic, government, law enforcement and related agencies. Of note was the use of the COSMOS platform for a weekly BBC Radio 5 Live 'social media chart show' (http://www.bbc.co.uk/programmes/b04pvh8v) where the stories behind trending topics on Twitter were ranked and used to configure an innovative news delivery platform that disrupted traditional headlines and news generation. It also served to highlight the work of the project and COSMOS platform to a national wide UK audience.

The talk connected the work on hate speech being carried out by COSMOS with the Oxford Internet Institute.

The talk stimulated debate and informed thinking for the forthcoming digital wildfires research project.

Year(s) Of Engagement Activity

2014

Description

Digital Responsibility Event

Form Of Engagement Activity

Participation in an activity, workshop or similar

Part Of Official Scheme?

No

Geographic Reach

National

Primary Audience

Schools

Results and Impact

The event involved a series of workshops with academics and secondary school teachers from across Wales. The event included both research and professional practice based presentations. The workshop explored issues regarding the fostering and promotion of the responsible use of new digital technology inclusive of social media. It was also an opportunity to feed in findings from our research regarding antagonism and discriminatory interaction on social media platforms and report on co-produced 'lesson materials' (designed in part to promote responsible and safe use of social media) that we have designed and co-produced with teachers as part of our aligned British Academy small social science project and follow on work.

Year(s) Of Engagement Activity

2016

Description

Cyber-hate on social media in the aftermath of Woolwich

Form Of Engagement Activity

A talk or presentation

Part Of Official Scheme?

Yes

Geographic Reach

National

Primary Audience

Policymakers/parliamentarians

Results and Impact

The talk sparked questions and discussion afterwards; it also provided an opportunity to present findings and link with key policy decision makers.

The talk stimulated interest in the use of the COSMOS platform and community tension monitoring via social media analytics and how this might be embedded into decision making.

Seminar talk/lecture summarizing methodological approach and initial findings. Followed by discussion and engagement will political scientists and allied social researchers working in the area of hate speech and community relations in east Africa. This led to further contact and identification of possible points for future collaboration in these and related area with the international relations research group based at Wolfson, College, Oxford University.

This talk stimulated interest in the project from international relations scholars.

Year(s) Of Engagement Activity

2013

Description

New BBC Radio 5 Live programme incorporating COSMOS

Form Of Engagement Activity

A magazine, newsletter or online publication

Part Of Official Scheme?

No

Geographic Reach

International

Primary Audience

Public/other audiences

Results and Impact

The project contributed to the development of algorithms and a version of the COSMOS platform being used for a new BBC Radio 5 Live programme where trending social media topics and items inform discussion and public engagement (http://www.bbc.co.uk/blogs/5live/posts/5-live-Hit-List).

Informed major BBC 5 Live programme that will be a major window for the COSMOS research programme and insights generated by the project.

The 'Collaborative Online Social Media Observatory', British Academy, London, 2013

Form Of Engagement Activity

Participation in an activity, workshop or similar

Part Of Official Scheme?

No

Geographic Reach

International

Primary Audience

Public/other audiences

Results and Impact

An overview and presentation of the COSMOS platform to number of end users, policy makers, members of the social science research community, statutory bodies, government and commercial organizations. This included a detailed breakdown of our current projects including 'Hate Speech and Social Media: Understanding Users, Networks and Information Flows'.

This talk prepared the ground for beta-testing of the COSMOS platform and prepared the ground for release of the desktop version of COSMOS in September, 2014.

Year(s) Of Engagement Activity

2013

Description

The Collaborative Online Social Media Observatory: Past, Present and Future', School of Journalism and Communication, University of Queensland, Australia. 2013

Form Of Engagement Activity

A talk or presentation

Part Of Official Scheme?

No

Geographic Reach

International

Primary Audience

Other academic audiences (collaborators, peers etc.)

Results and Impact

Lecture on the Collaborative Online Social Media Observatory; theory, method and data. Specific reference made to 'Hate Speech and Social Media: Understanding Users, Networks and Information Flows' as exemplar case study.

After the talk members of the University of Queensland agreed to work as associate members of COSMOS and share engineering and social science expertise where appropriate.

Year(s) Of Engagement Activity

2014

Data

The Data on this website provides information about publications, people, organisations and outcomes relating to research projects

APIs

A set of REST API's enable programmatic access to the data. Refer to the application programming interfaces
GtR and GtR-2