Summary of Key Findings

The Twitter accounts managed by the IRA show a general, coherent strategy, which relies on dissemination of both political and non-political topics. This finding holds even during the peaks of activity motivated by external events. A substantial part of the activity is lifestyle-related, creating a fog of "neutral" but engaging content, whether through hashtag games, posting about holiday’s topics or by sharing local news. This "stealth" strategy makes it harder to distinguish them as trolls.

1. Introduction

Recently, the trolls of the Russian ‘Internet Research Agency’ (IRA) have gained much attention, mostly for its international focus and its activity around the US 2016 presidential elections. Though the agency’s divisive strategies might be nothing new, the scale and scope of its operations seem unprecedented compared to other cyber armies. Besides that, the wide geographic span of the IRA is also exceptional. The agency actively targeted German, Dutch, Arabic, French, Finnish and American audiences. The subjects included not only issues within the scope of Russian interests, such as the war in Syria, but also very internal controversies in different countries. Those working in the agency were given different targets, including how many posts and comments they were to deliver each day, and specific subjects to write about or persons to smear or attack. This inside knowledge became known due to the investigative journalism of Novaya Gazeta and Moy Rayon, who deployed undercover journalist in the agency, and leaked emails from its management (Seddon, 2014, Nimmo and Toller, 2018).

The general agreement is, that the IRA was part of a classic information war strategy: to sow discord within countries, polarizing debates and contributing to a general feeling of anxiety and mistrust. Many reports have zoomed in on this polarizing, divisive typical trolling patterns (DiResta et al., 2018, Howard et al. 2018, Boyd et al. 2018). Researchers have concluded that the IRA was very well informed about American online trolling culture. Indeed, the trolling tactics included hashtag hijacks, personal attacks, meme’s and even doxing (Lacour, 2018). As the NewKnowledge Disinformation report concludes: "It was designed to exploit societal fractures, blur the lines between reality and fiction, erode our trust in media entities and the information environment, in government, in each other, and in democracy itself." (DiResta et al., 2018, p. 99). During the Digital Methods Winter School 2019, we wanted to explore how this aforementioned strategy looked like, in order to understand if there would be patterns of such an operation that could function as ‘red flags’ for future discovery.

2. Initial Data Sets

In October 2018, Twitter has released information on accounts that have been found to be associated with disinformation operations and their contents. The dataset includes 3,841 Russian and 770 Iranian accounts, for a total of ten million tweets, and it goes back to 2009. In this project, only the Russian trolls were analyzed. The initial data set was available in the DMI-TCAT tool, but wach sub-project worked with a different timeline.

3. Research Questions

The main objective of the project was to zoom in into the Russian misinformation strategy and the types of narrative they implemented. Is it possible to identify a general, coherent strategy of the IRA accounts? Are the narratives persistent over time or they change opportunistically?

4. Methodology

4.1 - Peaks and activity timeline

To gain insight on the intensity of the activity, we plotted a timeline of the number of tweets per day from 2009 to 2018 using DMI-TCAT. The data visualization showed peaks, which were identified with possible triggering events. We also run a query on DMI-TCAT for the most used hashtags (mentioned at least 1000 times) in those days.

4.2 - Hashtag analysis

To get an idea on the main topics in which the IRA accounts concentrated, all the English hashtags tweeted at least 1.000 times per day were selected through DMI-TCAT, analyzed qualitatively, and grouped according to topics.

4.3 - Co-Hashtag Network Graph

As the initial data set is massive, we first defined a smaller period of time, from 01/01/2014 to 12/12/2015, and captured the data with the DMI-TCAT. Afterward, the sub-section served as input for the Gephi Network Visualisation tool, in which were applied the filters “Giant Component” and “Degree Range”. Then, “Force Atlas” layout algorithm was applied. Finally, the “Modularity” (community detection algorithm) was used in order to separate the different clusters of hashtags formed according to their relevance and connection to each other.

4.4 - Main topics

Since a substantial part of the posts has no hashtags, we used a topic modeling approach to consider also text. The statistical model was built in R programming language and considered data from 2014 to 2018, with 3.283.238 tweets and 3.257 accounts.

4.5 - User specific narratives

Looking at the different characteristics of metrics, it was clear the set was very diverse in number of tweets, followers, engagement, friends and hashtag use. To zoom in, we listed the most active users which were not anonymized by Twitter and posted in English. In the top user's list, two major characteristics were immediately identifiable: accounts who were posing as regular users and accounts trying to pose as local news outlets. From there, three accounts for each "profile" were selected based on the highest number of tweets, mentions, and friends. Besides the qualitative evaluation of their posts, we used DMI-TCAT once again to make visualizations of their activity and hashtag use.

5. Findings

5.1 - Peaks and activity timeline

The trolls’ activity was particularly dense between 2014 and mid-2017. During this period, there were some fluctuations, with peaks that reached up to 176.000 tweets per day. The overall highest activity has been registered in 14th July 2014. On that date, the MH17 was shot down in eastern Ukraine’s. Although in this case a clear connection with a politically relevant incident can be seen, not all the peaks are related to some important events. A closer analysis of the most tweeted hashtag during the peaks has shown that trolls were spreading both political issues (ex. #SaveDonBassPeople, #maga) as well as everyday topics (ex. #Rulesforeverydayliving, #My2017Resolution).

5.2 - Hashtag analysis

Ukraine Conflict Disinformation Campaign (June 2014 – May 2015)

Primary Target: Rallying Russian Audience behind the Government

Secondary (enemy) Target: Sowing doubt among Western Audience

Hashtag Categories: pro-Russian, Ukraine, Hashtag Games, News

Overall, this campaign is overwhelmingly being conducted in Russian, thus primarily targeting a Russian audience. Use of popular English language hashtags is rather scattered and picks up only after the first third of the campaign.

This campaign significantly differs from the above campaign in targets and means. It is conducted almost exclusively in English, and makes systematic use of Hashtag Games and Political Hashtag games, while by and large refraining from the use of news-related hashtags. Probably the most interesting finding is the large amount (55) of non-political hashtags summarized in the category Hashtag Games and Random, which is somewhat of a default category. It both involves explicit hashtag games such as #IfIWereYourMom, and other hashtags such as #Ihate or #love. Remarkable here is that the use of explicit hashtag games in English only starts during the US election campaign; before that, this category is much less represented and also less consistent. However, if one considers the hashtags in Russian language as well, one sees that during the peak of the Ukraine conflict, trolls engaged in large scores of Russian-language hashtag games. This leads to the conclusion that during the Ukraine Disinformation Campaign, the prime target to be deluded (into believing trolls were genuine users) were Russians, while during the US Election Disinformation Campaign, the primary targets for mobilization were conservative Americans.

Political Hashtag Games are a somewhat transitory category, as they are hashtag games of an explicitly political nature, such as #IAmNotThePresidentBecause, #IfIWereTedCruz or #ThingsMoreTrustedThanHillary. These are both playfully mocking and mobilize political sentiments, and thus political propaganda mostly in the service of conservative sentiment. The category ‘News’, on the other hand, is a broad container for reactions to the news all over the world that lacks clear indications for the political position taken towards these. Most likely these hashtags are used similarly to hashtag games and political hashtag games, both to mimic genuine Twitter users as well as to spread certain news content that suits campaigns.

5.3 - Co-Hashtag Network Graph

Seven different clusters of hashtag groupings can be observed to emerge, namely: a hashtag grouping consisting of broadly “conservative” hashtags, one with “black lives matter” issues, one with hashtags focusing on “America” and “American power”, one about emotional/personal states which can be regarded as broadly “apolitical”, another one focusing mainly on “Koch farms”/ “food poisoning” event, and two others in “Russian” and “Arabic”. Although the conservative cluster appears to be larger than the others, a considerable hashtag cluster which is apolitical can also be observed.

5.4 - Main topics

5.5 - User specific narratives

Fake partisan users selected:

@gloed_up

@Jenn_Abrams

@TEN_GOP

Looking at the user accounts @Jen_Abrams, @TEN_GOP, and @gloed_up in more detail, it was clear they each had a very distinctive way of operating. They had their own characteristics, which they followed through during their lifespan. For example, @Jen_Abrams style was personal, sharing information and events about her posed daily life. Her profile was conservative, but most of her tweets were not political at all but lifestyle and personal tweets. Therefore, she portrays a great diversity of hashtags.

@TEN_GOP was a very active account for almost 2 years, with 147767 followers and over 10.000 tweets. The bio of the accounts reads: ‘ Unofficial Twitter of Tennessee Republicans. Covering breaking news, national politics, foreign policy and more. #MAGA #2A’. And this is indeed the profile the account followed. What was interesting about this account is the struggle it fights to remain online after it gets removed by Twitter. In July 2017, @TEN_GOP account is suspended for the first time. Different accounts are created to ‘defend’ the Tennessee account. In other tweets, the newly created accounts are accusing Twitter of censorship of conservative voices. The newly created accounts gained quick traction, getting support from legitimate users. Among this troll-theater, there was a staging of other fake accounts.

The account @gloed_up consistently followed the profile of a Black Lives Matter activist. This included a critique of police violence and racism, but also more positive messages around black identity and some ‘humorous’ tweets. Among the tweets of the selected accounts were mentions, support, and retweets from other IRA accounts. In some cases, two accounts which were on the opposite side of the political spectrum would start an argument.

Local news accounts selected:

@TodayNYCity

@TodayPittsburgh

@WashingtOnline

These accounts appear a considerable part of the IRA’s campaign in terms of activity. However, the main strategy here is not only spreading “fake news”, as some would expect, but also create a pile of real “junk” news, or what we could call “neutral content”. The group can be accountable to maintain a substantial part of troll’s activity distant from the partisan political debate, concentrating efforts on lifestyle and local news. In the first year and a half of activity, most of the posts contain headlines about ordinary topics, like sports, economy, and showbiz, which are not immediately identified with a determined political view. The messages had no images or link to the source of information. In this first part of the strategy, the productivity was intense but the result is simple, which could be considered an effort to build an audience and some credibility.

Then, there is a gap of activity in 2016, before the presidential elections. For example, @TodayNYCity stopped tweeting for almost eight months, from January to September 2016. @TodayPittsburgh went off for three months, and the pattern can be seen in the whole group of local news accounts. After that, these accounts returned with slightly different characteristics. They use images and shortened links to cite sources. Even in this second phase of the strategy, they avoid engaging with “partisan” hashtags, like #BlackLivesMatter or #Maga. Instead, they use simple hashtags (e.g., #news, #NY, #sports), which are consistently delivered through the accounts' lifespan. In terms of content, most of the posts/headlines are almost real-time stories about ordinary topics (sports, local news, economy, and showbiz) published by other news outlets, from Fox News to CNN and The Wall Street Journal.

6. Discussion

How may hunting down trolls be prevented from escalating into political censorship of controversial positions? The problem: nowadays every side accuses the other side(s) of being trolls - not based on them working for IRA or other troll factories, but based on their political position.

7. Conclusions

From the analysis of the Twitter activity available in DMI-TCAT, it is possible to see a general, coherent strategy of the IRA accounts. The different approaches with diverse methodologies show at least one consistent characteristic of the disinformation campaign: a substantial part of the IRA’s tweets was non-political or lifestyle-related. Much energy and resources have been spent to create such ‘neutral’ but engaging content; whether it was through extensive hashtag games, or posting about holiday’s topics or by sharing nondescript local news. Such activity has been coined ‘camouflage content’ (Boyd et al. 2018, DiResta et al. 2018). This content is meant to increase engagement, followers and in case of personal accounts, identification. The content which drives a certain agenda - whether one of division or one of specific political direction- is thus camouflaged by harmless, funny or mundane non-political content. Since the users were not only dedicated to pushing a certain political agenda, it was harder to distinguish them as trolls. A second common characteristic is the theatrical performances by user accounts to further increase their credibility; troll users have a highly developed persona which consistently followed a certain character, trolls from different sides attack and respond to each other and are warning others for fake users and vehemently protesting account takedowns.