Data and Models

Overview

The Illuminating 2016 project has been collecting Facebook and Twitter messages from the official campaign accounts of all of the major party presidential primary candidates. We collected both candidate-generated messages and public commentary through Facebook and Twitter’s Application Programming Interface (API). We are still collecting and analyzing the messages in real time. The Illuminating 2016 website refreshes once an hour with the latest data that we’ve collected.

For the past year, we have been creating a system that automatically classifies each message into a category based on what the message is trying to do. For candidate-generated messages, the categories include the following types: urging people to act, changing their opinions through persuasion, informing them about some activity or event, honoring or mourning people or holidays, or on Twitter having a conversation with members of the public.

For public-generated messages, we currently identify messages that are focused on a presidential candidate or surrogate, a political party, or another prominent politician. For the type of message we focused on, we first identify whether it criticizes or shows support to the primary subjects on issues or image, then we identify the target of attacking and supporting in the message. The targets of messages are not featured on the site currently. Although we are collecting public commentary on Facebook and Twitter, the Illuminating 2016 project currently only provides analysis of the public’s comments on political campaign Facebook walls.

Each message is categorized based on particular features of the message that we have identified using Machine Learning methods. Machine Learning is a computational process for classifying unstructured data, like Facebook messages. The process for developing the algorithm requires that humans first develop the categories and then place a sample of messages into the categories. This categorized data is then fed through computer software that look for patterns and features in the messages that are in the same category.

The candidate-generated data currently presented on Illuminating 2016 is accurately categorized approximately 75% of the time. For some categories, the accuracy is up to 80%, such as call-to-action and the persuasive message (advocacy, attack). For call-to-action sub-categories, digital engagement, media and debate appearances, giving money, and vote, the accuracy is about 80%. For image, issue, and endorsement, the accuracy is 76%. For the informative and conversational categories, the accuracy is 70%. For the ceremonial category, the accuracy is lower, at around 40%. The reason for the lower score for this category is that there are far fewer of these messages and they often express a wider range of features making them harder to classify.

For public commentary on Facebook, for the category of politicians, the accuracy is up to 86%. For attacking messages, the accuracy is 74%. For support messages, the accuracy is 70%.

Note that some messages may contain multiple categories. For example, it is possible for messages to be both strategic messages and call-to-action. Currently, they are classified into one of those two categories based on the strength of the features that distinguish the categories. In the future, we hope to enable messages to receive multiple categories.

We are constantly working to improve the accuracy of the algorithms, by using additional techniques available to computational social scientists. As we improve the algorithms, all the data are re-classified to give users of this site the most accurate view of the campaign as possible.

The Visualizations

We present the data using interactive visualizations that allow website visitors to explore the data in ways that interest them. Visitors can filter the data based on platform (Twitter or Facebook), candidate’s, time frame and our different message categories. The graphs update automatically each time a visitor changes the filter settings. For example, a visitor might un-check all candidates except for Donald Trump, then ensure that both "Image" and "Issue", under "Attack" are checked. The main plot now shows a comparison of the types of political attack messages Trump has used over the last month. The bar plots lower on the page also update and show the numbers of these kinds of messages with the change Trumps Twitter Followers and Facebook page likes. If the visitor now includes Clinton, by checking the box next to her name, they can instantly compare the differences in attack message strategies of the two candidates.

If the visitor click on the names of the candidates on the main page, they can see a candidate summary, which allows them to check public responses to candidate-generated messages, including the number of re-tweets, shares, and likes. For Facebook, the visitor can also check categories of public commentary, e.g. numbers of messages referring to politicians, surrogates, and parties, number of messages attacking or supporting the primary subject in message.

Terms of Use

The visualizations and analysis are available for use under a Creative Commons 4.0 International License. You may adapt the visualizations and analyses provided here for. This research and the visualizations can be used for commercial purposes. Attribution of the Illuminating 2016 project is required. Questions about using our materials should be directed to illuminating/contact.

The Categories: Candidate-Generated Messages

Call-to-Action

Any message that is about supporters, focused specifically on encouraging them to do something (persuasion), is a call-to-action.There should be some force placed on the reader of the message to do something, even if it is softly expressed (e.g. “please come;” “we hope to see you there,”). Includes questions “have you seen our new ad?” Excludes rhetorical questions

Subcategories include:

Traditional Engagement: Messages that invite supporters or their friends or family to volunteer for the campaign in traditional ways: making phone calls, door-knocking, registering people to vote, holding a house party, attending an event, helping get people to the polls, join a debate watch party.

Digital Engagement: Messages that invite supporters or their friends or family to volunteer for the campaign in digital ways: watch or create videos or to share photos to submit online; or that encourage supporters to retweet or share a message, including encouraging the promotion of particular hashtags, memes, or images.

Media and Debate Appearances: Messages that encourage supporters to listen to the candidate or watch the candidate on the news, call-in shows, debates, forums.

Giving Money: Messages that invite supporters or their friends or family to give money to the campaign.

Buying Merchandise: Messages that urge people to buy items from the campaign, like hats, stickers, or buttons.

Vote: Messages that urge supporters to vote, or to vote for the candidate, including messages that have some policy or rationale component to them. Includes calls to register to vote, to vote early, or to mail in one’s absentee ballot, or urging people to “stand with” the politician on election day.

Ceremonial

Any message that is explicitly religious, gives thanks, praise, pays tributes, honors, or expresses condolences (using the terms that are associated with those) to family members, or the public around national holidays or commemorative events. Includes messages about winning or losing the election, including hope, anticipation, or disappointment around the election outcome. Includes messages that are simply jokes/puns/humor without reference to any of the other categories. Includes praise/cheers for sports teams/sports games (but not statements about the candidates at or watching events that have no praise).

Conversational Twitter only

Messages that are simply a direct address to a single person or a small group (2-3) of people in the form of a response to a message they received. These messages should feel like conversational replies to prior messages, even if it’s only “thanks”, and must contain a direct reference to the person/people they’re responding to.

Informative

A message that that is to or about supporters (and observes of the campaign, such as the news media) focused specifically on information about the campaign (presented neutrally, without a persuasive appeal or action verbs is an information message).

Advocacy

A message that advocates for the candidate, highlighting their strengths as a leader, describing their prior policies or personal history, describing or featuring their family, describing or highlighting their current and future policy positions, or featuring their positive personality characteristics, is an advocacy message. Includes generic claims about the candidate being a good candidate, being supported, or being good for the state without explicit references to policy. Includes messages that position the candidate in opposition to policies or other candidates.

Image: a positive message about the candidate’s character, personality, style, values, or ability to lead.

Issue: a positive message about the candidate’s policy positions.

Attack

A message that criticizes the opponent or opposing administration or party on their personality, leadership skills, past behaviors, family, policy issues, campaign events, or any other negative focus on the opponent (or their campaign, surrogates, or family) is an attack message. Must be an explicit or strongly implicit reference to opponent, their party, or surrogates. Includes generic references to responses to attacks or attacking the opponent. Typically explicit references to the opponent are attacks, even if the attack is somewhat implicit.

Image: a negative message about the opponent’s character, personality, values, style, or ability to lead.

Issue: a negative message about the opponent’s policy positions.

Endorsement

Messages that feature an endorsement or support for the candidate from an important political person, celebrity, or organization: law enforcement, unions, the local newspaper, a prominent political figure.

The Categories: Public-Generated Messages

Politicians, surrogates, or parties

For public-generated messages, we currently only display messages that are on a given candidate’s Facebook wall and are referencing the candidate or their surrogates, political party, or another prominent, related politician (such as the President, a governor, or a senator).

Attack

A message that criticizes the candidate, their surrogates or party, primarily concerning issues/policies or on grounds of character, personality, style or values.

Support

A message that advocates or shows support for the candidate, their surrogates or party, primarily concerning issues/policies or on grounds of character personality, style or values.

Messages can be assigned both attack and support codes, but messages cannot attack and support the same thing (e.g., “Jeb did great, too little too late” cannot be both attack and support).

Target

Target in messages refers to attacking/ supporting whose images or issues in public comments. For Facebook, attacking or supporting messages under a given candidate’s wall might or might not attack or support this candidate; therefore, we would like to investigate the targets in public talk to see whom the public is attacking or supporting. The rules which we are building to identify the targets of attacking and supporting are built on several variables, such as whether candidates’ names or pronouns exist in public talk, e.g. you, he, she, and whether a public comment replies to candidate directly or reply to another public’s comment.

Illuminating 2016 is supported by the Tow Center for Digital Journalism at Columbia University and the Center for Computational and Data Sciences at Syracuse University's School of Information Studies.