ADVERTISEMENT

Conventionally, a predictive program like this would be looking through Twitter traffic and trying to match up what it sees to a certain model. You might program it to look for a certain "step", as one topic becomes more prominent against the general background chatter. Shah explains: "This is a very simplistic model. Now, based on the data, you try to train for when the jump happens, and how much of a jump happens. The problem with this is, I don't know that things that trend have a step function. There are a thousand things that could happen."

As a result, their algorithm doesn't look for a certain predetermined pattern in samples of Twitter traffic, but instead looks at the changes over time in the number of tweets about each new topic and compares that to the changes over time of every sample in the training set. If a new topic statistically resembles one of the samples, it is given a weight in predicting whether the new topic will end up trending. Effectively each of the training samples "votes" on the likelihood of a new topic trending or not, with some samples' votes being weighted more highly than others.

Read more

Facebook becomes more like Snapchat with its Camera Effects Platform

ByAmelia Heathman

The combined votes deliver an indication of the likelihood that a new topic will trend.

The algorithm was trained by Shah and Nikolov using a training set that contained 200 topics that did trend on Twitter and 200 which didn't. They let the algorithm get to work, and it managed to pull out the successfully trending topics from the unsuccessful ones with 95 percent accuracy, with just a four percent false-positive rate -- i.e. topics that were predicted to trend that then didn't.

The reason this model works, though, is the same reason most models work differently -- since it doesn't filter the traffic it's interested in, it requires more computational power than conventional models. Since this algorithm "scales proportionately with the data", Shah says, it might not be useable over very large data sets except for companies like Google, Facebook, Amazon or others who have the largest cloud computing capabilities.

ADVERTISEMENT

And while it has commercial implications for Twitter itself -- the company might be able to use it to charge for ads linked to popular topics -- the algorithm could also be trained for a variety of other situations, which could include predicting stock prices.