Examining Racism in Ferguson Tweets

July 25, 2015

Social media has been critical for organization, spreading thoughts, and sharing
streaming video of the Ferguson protests. In this project, I examine Twitter
users general sentiment torward important topics related to Ferguson, police
brutality, protesting, and the media both within Ferguson and around the world.

Methodology

Twitter activity related to Ferguson peaked around mid-August. During a period
of two hours on August 17, 262,999 tweets using the word ferguson were
collected using the Twitter Streaming API. The Twitter Streaming API returns a
random sample of 1% of all tweets containing a word or phrase, with or without a
hashtag (both ferguson and #ferguson were included in the data set).

Many of these tweets were non-original. A preliminary analysis showed that 81%
of the tweets collected were retweets of a subset of more popular tweets. To
prevent the opinions of several popular Twitter users from overpowering the
original opinions of other users in the analysis, these retweets were
eliminated, leaving 46,604 original tweets.

A simple measurement of word frequency was used to find the common themes of
Ferguson tweets. An individual tweet was given a positivity score using the
MPQA subjectivity lexicon to measure positivity of each of its words.
The positivity of a common word was calculated by finding the mean positivity of
individual tweets containing that word. Combining sentiment and co-occurrence
frequency of these common words was used to create a topic graph showing the
relation of these words.

The top five most common words were live, people, gas, tear, and
curfew. Less common words included livestream, media, and missouri.
Common words can be used as a simple way to determine the common subjects of
tweets. In terms of frequency, police violence is one of the main issues on
people’s minds. Racial words, like black and white are present, but less
common in conversation.

A subjectivity lexicon contains a list of words and their corresponding polarity
and magnitude. In the MPQA lexicon, words are labeled as positive, negative,
neutral, or both and strongly or weakly subjective. For this project, strong
words were given an absolute value score of 2 and weak words a score of 1, with
positive words being positive and negative words being negative. For example,
horrible would be scored as -2, while okay would be scored as 1.

The polarity score of a tweet was simply the sum of the polarity score of its
words. The distribution of tweet scores is shown above. One drawback of this
coding scheme is that it cannot distinguish between low-sentiment tweets that
contain mostly neutral words and mixed-sentiment tweets that contain both
positive and negative sentiment. However, as tweets can only contain 140
characters, it is difficult to express more than one sentiment per tweet, so
this coding scheme should be less of a problem when applied to Twitter data.

The plot above shows the distribution of positivity scores for all 46,604
original tweets. The modal sentiment is zero. Zero-sentiment tweets made up 34%
of all original tweets, while 30% of tweets expressed an overall positive
sentiment and 35% expressed an overall negative sentiment.

The diagram above shows the distribution. The modal sentiment is zero. 34% of
tweets were in this category, while 30% of tweets expressed negative sentiment
toward Ferguson and 35% expressed positive sentiment. This is a fairly even
distribution of sentiment.

One possible problem with analyzing tweets is the kind of people who tweet in
the first place. It is possible that people who feel strongly about Ferguson are
more likely to tweet about it, resulting in a bimodal distribution. What the
results show in the case of Ferguson is that this was not a problem. Tweets
appear to be distributed across a wider spectrum of sentiment. The data appear
to be reflective of more general sentiment rather than sentiment of people with
extreme opinions.

This graph uses spectral clustering to group some of the more commonly used
words by how frequently they are mentioned together in tweets. The red coloring
is proportional to the negativity expressed in tweets related to those topics.
While it’s often difficult to identify the meaning behind clusterings of this
kind, a number of distinct clusters emerged (clockwise, starting from the top):

Events on the ground of the Fersugon protests

Micheal Brown, his autopsy, and the inciting event

Traditional media coverage and citizen journalism

General, broader topics from a black focus/perspective

General, broader topics from a white focus/perspective

The most striking result is the treatment of the word black by Twitter users,
both in terms of sentiment and clustering. In most cases it is used in a
negative context; however, the reason for this may differ based on the Twitter
user’s political views and perspective of the situation.

Some Twitter users use the word black in tweets that make negative comments
regarding race, counting as a negative usage of the word black. However, many
tweets use this word negatively with respect to the situation rather than the
group of people. Because the situation is highly negative, this gives the word
black a highly negative positivity score, even though the negative sentiment
is directed toward the police, Ferguson, the media, or the concept of police
brutality rather than the group of people.