Big Data and Politics: How the Internet sees the US Election

Nothing is a hotter topic than the US Election, especially if you’re a statistician at heart. Legions of us have been mesmerized by the idea of predicting who gets to be the most powerful President on the planet.

This year, however, it’s far more fun to kick back and watch the Internet collectively explode over each and every one of the candidates in the limelight. What with Clinton’s emailgate, Bernie's economics, Ted Cruz’s household issues and Donald Trump’s existence...

WSO2 is a technology company. We looked around and realized that we had the tools to observe this theater on an unprecedented scale. We’d like you to join us.

At its heart, the Election Monitor is the WSO2 Enterprise Service Bus (ESB), Data Analytics Server (DAS) and Complex Event Processor (CEP). The ESB scans Twitter, pulling conversations about the US Election every second. DAS and CEP go to work on these tweets.

The first thing we’ve done is build this (real-time) counter (shown above) of the number of unique Twitter accounts talking about each camp. In a 24-hour time window, as of the time of writing, the Republicans seem to be dominating the Twittersphere.

That’s a huge margin, isn’t it? Let’s find out why as we go along.

This is firstly a live feed of what we’re getting from Twitter. The gray columns are the interesting ones: they display the most popular recent tweets - recent being not more than 12 hours ago. Donald Trump often dominates both fields. Occasionally, Bernie seems to break through. As of the time of writing, in the “Popular from candidates” column, Donald Trump has three tweets, one of them about a reporter touching him. The others are one tweet from Clinton “Enough is enough” and one from Bernie talking about deficits.

This is consistent for what we’ve seen so far; ever since the site went live, Trump’s snazzy one-liners have consistently gotten more retweets and favourites than Bernie and Clinton’s policy-centric tweets. It would appear that one man / tweep from the Republican party is more popular than every other candidate put together… are we really surprised that there’s more people talking about the Republicans than the Democrats?

But what about their followers? Using candidates’ hashtags, we can peek into the conversation by sifting through tweets and finding the most used conversations in that space.

Trump’s people are talking about the border. No surprise there. They’re also talking about New York. That corresponds with the fact that Hillary Clinton just took aim at Trump in a N.Y. ad. It shows a white Trump supporter sucker-punching an African American protester.

Bernie’s community, too, is talking about the debate. There’s few other clues in his wordcloud at the moment.

Ted Cruz’s community is talking about his wife. That’s because he’s mired in a bit of controversy now: the family man is being dodgy about questions regarding his marriage. There’s a lot of questions about his principles.

There’s one man missing from this: John Kasich. As of the time of writing, he’s got 143 votes. Cruz had 463. Trump has 736. They all need to hit 1,237 for nomination.

As remote as Kasich’s chances look in the polls, he barely exists on Twitter. For now, we must exclude him.

Step three of the site is the community graph - or, as we call it, the attention graph. Here we map out the most popular accounts talking about the US election. The larger an account’s bubble is, the more popular it is.

What do we see? Donald Trump has gathered more attention to himself than any other tweep. It’s not even a small margin. Dan Scavino comes in at a distant second. Everyone else is miniscule, like little asteroids orbiting Planet Trump. And yet even those tiny accounts get over 2000 likes and retweets. These are the people who are essentially driving opinion on Twitter.

The fourth and final part is how the media’s opinion of a candidate changes over time. By analyzing news articles published online, we can determine shifts as campaigns unfold.

Consider how attitudes have changed towards Hillary. Here’s her standing on the 15th of March:

Here’s her standing on the 17th:

Opinion has swung her way. Examine the titles of the news articles on those days. On the 15th of March:“Was Hillary Clinton Bribed For Her Iraq War Vote?” And “The Cure to Hillary Clinton's Problem With Millennials? Donald Trump.” Not that good.

On the 17th? “How Hillary Clinton Triumphed on Tuesday” and “Hillary Clinton Becomes Kween of Broad City”.Short on the heels of a victory comes better press.

It’s fascinating to see how the American media react to candidates as they take on world events. Opinion on Trump, for example, hit rock bottom over his views on China and implications that supporters could go haywire.

Our collection of insights has just gotten started, of course. As the election unfolds, all of this will be running. While we can’t say that Internet is go along to predict who wins, we think it’s a pretty interesting gauge of what the people and the press of America are thinking.