Did We Predict the Hung Parliament?

In two of the most recent major democratic votes, Brexit and the U.S elections, two things have happened.

1. The result has been a massive surprise.

2. Social media has been an accurate indictor of the outcome.

So when the snap election was announced in April this year, it got me thinking. Could we use Twitter and our Arrow reference architecture for big data to predict the outcome?

I decided it was worth a try.

The first thing I needed to decide was what was I actually going to determine. I settled on trying to determine the percentage share of positive voice on Twitter for one of the main three parties.

Next task was to try and figure out how I was going to determine this. I tried to understand the various metrics required in quantifying this on Twitter and it quickly became clear that this was a much more complex task than I had imagined, due to the dimensionality of opinions and political biases.

Classifying users as one political bias and one sentiment class doesn’t work - as they may talk positively about the Conservative party, then negatively about Labour.

fig.1 Twitter users' 10 dimensions of political opinion

Therefore, if I classify them once I would miss out on impactful negative commentary.

But to keep pace with the flow of Twitter data we were detecting (at peak circa ~1000 messages a second) I couldn’t classify every Tweet that came in, as I am leveraging IBM Watson Cognitive REST APIs to provide unstructured analytics.

I therefore implemented a number of ETL processes using the Arrow Reference Architecture for Big Data to refine the data pre-Watson and then speed up the processing once classification had occurred.

I firstly used the keyword search API from Twitter to filter the returned results and then waited for 10 occurrences of a particular twitter user id. This helped me to focus in and analyse only users that were tweeting regularly about the election.

Once I had determined that someone was regularly tweeting about the election, it was then appropriate to further investigate them.

I sent the 10 tweets we had stored previously to a custom IBM Watson Natural Language Processing (NLP) Artificial Neural Network (ANN) to classify the users political sentiment and political bias.

At the same time it was important to determine the user’s influencer score. This is a metric I have calculated to determine the influence a particular user has to the Twitter audience.

I also wanted to determine the individual tweet’s influence score as well - so understanding how many times it was retweeted, favorited and by who.

We then took the positive commentary, plus the neutral commentary, minus the negative commentary - which gave us an arbitrary number for the size of voice.

We then represented this as a percentage share across the main three parties.

So how did we do?

During the election we detected ~90 million tweets with political content, of these 3.3 million tweets from 73,200 users were detected tweeting more than 10 times with political keywords.

Our size of voice calculations saw Labour having almost double the size of voice in all our measurements than their competition. The Conservatives and Liberal Democrats had on average 1 / 15th the size of voice of Labour. See the below graphic.

fig.2 showing the size of positive, negative and neutral voice of each party

Correlating the data

Twitter defines its demographic as 37% between ages of 18 and 29 and 25% of its users between 30-49. In the UK there are an estimated 13 million users.

This shows once again that Twitter was an accurate indicator of the outcome of the election, which was once again not as predicted by polls.

fig.4 YouGov study showing the age demographics and how they voted.

To get a better idea of how all of this data looks in a visual representation, below is a photo of our big data reference architecture in action and displaying on the big screens in our Dowgate Office in London.

Related Posts

Arrow Bandwidth S3, Episode 10 | Let's Talk about Splunk

This week on Bandwidth the guys take a good lookout one of the most interesting and disruptive products in the Arrow line card, Splunk. Tune in to learn more.

About Arrow

Arrow is a global supply channel partner to over 100,000 original equipment manufacturers, contract manufacturers, and commercial customers, in sectors ranging from aerospace to automotive, telecommunications to transportation.

Arrow’s contribution feeds into everything these companies produce, from cars to coffee makers to cable boxes. Consequently, Arrow touches everyday life for individuals across the world, at least twenty times a day on average.

Arrow is a Fortune 150 company and recognised as one of Fortune’s Most Admired Companies. Arrow operates in over 460 locations in 56 countries, with 18,500 employees serving 100,000 customers worldwide. Here in the UK and Ireland, we have over 600 employees working out of seven locations.