Posts about our text and social media analysis work and latest news on GATE (http://gate.ac.uk) - our open source text and social media analysis platform. Also posts about the PHEME project (http://pheme.eu) and our work on automatic detection of rumours in social media. Lately also general musings about fake news, misinformation, and online propaganda.

Thursday, 21 December 2017

Discerning Truth in the Age of Ubiquitous Disinformation

Initial Reflection on My Evidence to the DCMS Enquiry on Fake News

The past few years have heralded the age of ubiquitous disinformation, which is posing serious questions over the role of social media and the Internet in modern democratic societies. Topics and examples abound, ranging from the Brexit referendum and the US presidential election to medical misinformation (e.g. miraculous cures for cancer). Social media are now routinely reinforcing their users’ confirmation bias, so often, little to no attention is paid to opposing views or critical reflections. Blatant lies often make the rounds, re-posted and shared thousands of times, and even jumping successfully sometimes in mainstream media. Debunks and corrections, on the other hand, receive comparatively little attention.

I often get asked: “So why is this happening?”

My short answer is - the 4Ps of the modern disinformation age: post-truth politics, online propaganda, polarised crowds, and partisan media.

Post-truth politics: The first societal and political challenge comes from the emergence of post-truthpolitics, where politicians, parties, and governments tend to frame key political issues in propaganda, instead of facts. Misleading claims are continuously repeated, even when proven untrue through fact-checking by media or independent experts (e.g. the VoteLeave claim that Britain was paying the EU £350 million a week). This has a highly corrosive effect on public trust.

Online propaganda and fake news: State-backed (e.g. Russia Today), ideology-driven (e.g. misogynistic or Islamophobic), and clickbaitwebsites and social media accounts are all engaged in spreading misinformation, often with the intent to deepen social division and/or influence key political outcomes (e.g. the 2016 US presidential election).

Polarised crowds: As more and more citizens turn to online sources as their primary source of news, the social media platforms and their advertising and content recommendation algorithms have enabled the creation of partisan camps and polarised crowds, characterised by flame wars and biased content sharing, which in turn, reinforces their prior beliefs (typically referred to as confirmation bias).

On Tuesday (19 December 2017) I gave evidence in front of the Digital, Culture, Media, and Sports Committee (DCMS) as part of their enquiry into fake news (although I prefer the term disinformation) and automation (ako bots) - their ubiquity, impact on society and democracy, the role of platforms and technology in creating the problem, and briefly also - can we use existing technology to detect and neutralise the effect of bots and disinformation.

The list of questions was not given to us in advance, which, coupled with the need for short answers, left me with a number of additional points I would like to make. So this is the first of several blog posts where I will revisit some of these questions in more detail.

Let's get started with the first four questions (Q1 to Q4 in the transcript), which were about the availability and accuracy of technology for automatic detection of disinformation on social media platforms. In particular:

can such technology identify disinformation in real time (part of Q3) and should it be adopted by the social media platforms themselves (Q4).

TD;LR: Yes, in principle, but we are still far from having solved key socio-technical issues, so, when it comes to containing the spread of disinformation, we should not use this as yet another stick to beat the social media platforms with.

And here is why this is the case:

Non-trivial scalability: While some of our algorithms work in near real time on specific datasets (e.g. tweets about the Brexit referendum), applying them across all posts on all topics as Twitter would need to do, for example, is very far from trivial. Just to give a sense of the scale here - prior to 23 June 2016 (referendum day) we had to process fewer than 50 Brexit-related tweets per second, which was doable. Twitter, however, would need to process more than 6,000 tweets per second which is a serious software engineering, computational, and algorithmic challenge.

Algorithms make mistakes, so while 90% accuracy intuitively sounds very promising, we must not forget the errors - 10% in this case, or double that at 80% algorithm accuracy. On 6,000 tweets per second this 10% amounts to 600 wrongly labeled tweets per second rising to 1,200 for the lower accuracy algorithm. To make matters worse, automatic disinformation analysis often combines more than one algorithm - first to determine which story a post refers to and second - whether this is likely true, false, or uncertain. Unfortunately, when algorithms are executed in a sequence, errors have a cumulative effect.

These mistakes can be very costly: broadly speaking algorithms make two kinds of errors - false negatives (e.g.. disinformation is wrongly labelled as true or bot accounts wrongly identified as human) and false positives (e.g. correct information is wrongly labelled as disinformation or genuine users being wrongly identified as bots). False negatives are a problem on social platforms, because the high volume and velocity of social posts (e.g. 6,000 tweets per second on average) still leaves us with a lot of disinformation “in the wild”. If we draw an analogy with email spam - even though most of it is filtered out automatically, we are still receiving a significant proportion of spam messages. False positives, on the other hand, pose an even more significant problem, as they could be regarded as censorship. Facebook, for example, has a growing problem with some users having their accounts wrongly suspended.