Over the course of the 2016 election and beyond, the Russian government conducted a massive disinformation campaign to divide the American public, sow distrust, and influence electoral outcomes. In October 2018, Twitter released a comprehensive dataset of more than 9 million Russian troll Tweets. Today, we’re proud to release an interactive machine learning model that helps make the Russian disinformation campaign just slightly more accessible.

Given any English Tweet, our model determines whether that Tweet uses language specifically popular among Russian trolls—or whether the Tweet more closely resembles organic content. And while the model can distinguish between a representative sample of organic Tweets and Russian Twitter disinformation with more than 90% accuracy, it isn’t meant to uncover hidden Russian agents online; instead, it’s an educational tool designed to shed light on the type of content disproportionately conveyed by the Russian trolls. To try out the model yourself, tag @TrackTheTrolls in response to a Tweet, or use the tool below.

Please enter your text below

Type a tweet here.

0/240

All percentages shown in the tool above indicate the scaled confidence (from -100% to +100%) of the model that a text is theoretically a Russian IRA troll Tweet (positive values) or an organic Tweet (negative values) in an evenly distributed dataset.

Limitations and caveats—

Our scikit-learn-powered model uses an explainable machine learning algorithm to weigh the similarities of any given text to Russian troll content versus organic posts. The model helps to expose language that is specifically popular among trolls in an accessible and interactive manner. Our model does not detect yet-undiscovered trolls online, nor would it be able to. Instead, it’s an educational tool designed to detect patterns in troll language—nothing more, nothing less.

Here are a few caveats to keep in mind when playing with our model:

Just because the model identifies a word as being specifically popular among Russian IRA trolls doesn’t mean that the word is inherently ‘troll-like.’ Some terms, like “#BlackLivesMatter,” aren’t troll-like at all, but because they appear more frequently in IRA text than in recent organic discourse on Twitter, our model considers them to be a signal of especially troll-like content.

Our model is tuned for 2018 organic Twitter content. In 2020, our model may be far less accurate—language changes over time, especially on platforms such as Twitter, and our model may soon be obsolete.

We look at language, not context. Our model isn’t smart enough to understand the nuance of online discourse. In fact, despite being relatively accurate, our model is primitive by human standards: it looks at words individually, and doesn’t have any conception of context. Its intelligence emerges at scale, when it identifies patterns across millions of words. In a single 280-character Tweet, there’s only so much it can do.

How it works—

To analyze a Tweet, our model passes the Tweet’s text through a complex—but explainable—process. Following the tried-and-true method of using a scikit-learn Naive Bayes classifier for text analysis in Python, we assembled our model using a Count Vectorizer, TF-IDF preprocessor, and finally a multinomial Naive Bayes classifier. For the more technically inclined, here’s the core architecture of our model:

That’s right—with scikit-learn, it only takes three lines of code to define the model’s skeleton! While we won’t detail the ‘glue code’ that holds everything together here, the entire project is open source on GitHub. (For information about the data on which we trained our model, see ‘Data sourcing’ below.)

Here’s how the machine learning model works, in plain English:

First, the model splits the text into its parts (words), called tokens. It counts the total number of times each token appears in the text, and then returns this value as a vector. In our model, this is performed by the industry-standard scikit-learnCountVectorizer.

Then, the model compares the relative frequency of each token to its relative frequency in the training datasets. This helps identify the important words in a sentence while filtering out unimportant words (such as “and”, “to be”, and “a.”) This process, called TF-IDF, is performed by scikit-learn’s TfidfTransformer.

Finally, our model analyzes the relative token frequencies created by the CountVectorizer and the TfidfTransformer to determine whether the model uses language that is specifically popular among organic or Russian IRA troll content. This statistical analysis is performed by scikit-learn’s MultinomialNB classifier.

Data sourcing and software libraries—

Our model is built from a corpus of a corpus of 4.6 million English Tweets, split equally between a random sample of Russian IRA troll tweets released by Twitter and a representative collection of English Tweets collected over a two-week period in October 2018. Because Twitter’s Terms of Service prohibit the distribution of datasets that include Tweet content, we cannot open-source the entirety of our training data. (If you’re a researcher looking to improve on or reproduce our model, we’re happy to share the data with you individually. Contact us here.)

Like nearly all data science projects, we employed data science libraries to assemble and train our model. Our work would not have been possible without the following two software libraries.

scikit-learn. scikit-learn is a Python library that provides researchers with open-source tools for building machine learning models. It is among the most popular machine learning libraries available, and supports our model's core functionality.

NLTK. The Natural Language Toolkit is a popular open-source Python library that provides a number of useful text processing utilities.

Are you interested in using our model in your own software? We built an API to allow you to do just that! Send a HTTP GET request to https://ru.dpccdn.net/analyze/<your URL-encoded text> and you’ll receive a full JSON response. No authentication is required.

What’s next—

To conclude our inquiry into Russian digital propaganda, we will apply our machine learning model ‘in the wild.’ We’ll look at whether Donald Trump’s Tweets incorporate language specifically popular among Russian IRA trolls, and whether 2018 election-related discourse on Twitter has, by this same metric, changed since the 2016 election.

Our ultimate objective is to help spread awareness of the Russian government’s ongoing disinformation campaign in the United States. To this end, we invite you to experiment with the analyzer provided above, or to tag @TrackTheTrolls in reply to a Tweet.