Professional ruminator. Twice as dangerous as a T-Rex.

What Donald Trump is Tweeting (Analyzing Tweets with NLTK and Pandas)

How does @realDonaldTrump (Donald J. Trump) tweet? Before the President-elect became the president-elect, I didn’t pay much attention to his tweets, but I did know that he seemed to have a unique style of writing them. To me, it felt like he tweeted like he spoke in public. But what was that tone or Trump brand?

From a quick glance of his account, the usage of exclamation points seemed prominent. I wondered if that was consistent, and I wondered what else I could find. So I ran an analysis Trump’s tweets as well as @HillaryClinton, @CNN and @FoxNews’ for some comparison.

My strategy for selecting the accounts:

Find another individual user’s account with opposing views, but who had some similar goals over the last year + (to be come President of the United States)

Compare the individual users’ accounts with “objective” news sources’ Twitter accounts. Since source objectivity is always a topic of hot debate (or a hot topic of debate), I took @CNN and @FoxNews.

The Method

Pull as many of Trump’s Tweets as I could via Twitter API. The best service to perform this operation is /user_timeline. You can only get the last 3,200 tweets from any particular user handle, which is unfortunate because plenty of accounts have authored far more than 3,200 tweets (e.g. Trump has written over 30,000 tweets).

Take the tweets and do some Natural Language Processing with Python’s NLTK.

Get the parts of speech tags for every word in every tweet

Do word frequency counts

Classify the sentiment of the tweets

Classify the reading level or difficulty of each word. Determine the reading level of the account.

I didn’t implement this. I started, but then I got distracted.

Pull out an aggregate view of some other interesting tweet data, like #HASHTAG usage.

Compare data.

And Now, The Data

Punctuation! Punctuation! Punctuation! Trumpunctuation?

If you’ve seen Trump speak, you probably may have noticed his strong intonations and general emphatic demeanor. His tweets seem to capture this partially through punctuation alone. His words are sprinkled with the strongest punctuation mark in the English language, the exclamation mark.

“!” occurs 2336 times over 3200 Trump tweets.

And “!” occurs at least once 1954 times over 3200 Trump tweets.

Which means that 61% of Trump’s tweets contain an exclamation mark (based on my 3200 tweet dataset). This seemed astonishingly high; when I first pulled this I thought my code was incorrect. So I opened my file of tweets and eyeballed it to confirm. I figured if 60+ % of tweets contained an exclamation mark, then it would be easy to confirm my sanity (or lack thereof) from the text file.

Some of Trump’s Tweets

The highest frequency of “!” in a single tweet was five. And that occurred in the following tweet:

“#WheresHillary? Sleeping!!!!!”
Which was Retweeted 27,158 times and Favorited 61,084 times. Created on August 20, 2016.

Here’s the !!!! comparison among the entire group:

@realDonaldTrump Absolute Count (!)

@HillaryClinton Absolute Count (!)

@CNN Absolute Count (!)

@FoxNews Absolute Count (!)

2336

171

43

134

Adjectives Are Very Great

Trump’s top twenty favorite adjectives are listed below, along with the top 20 adjectives from the comparison group. It’s not really a surprise that “great” was number one for Trump because, among other things, his campaign’s slogan was “Make America Great Again.” Notably, I removed stopwords and only grabbed lowercase words to help filter out some noise from the data. Relative frequency is the percentage relative to the other adjectives in the dataset.

trump_word

abs_frequency

rel_frequency

great

211

4.953052

bad

81

1.901408

many

81

1.901408

big

77

1.807512

last

66

1.549296

new

56

1.314554

good

50

1.173709

much

34

0.798122

amazing

32

0.751174

total

30

0.704225

wonderful

30

0.704225

nice

27

0.633803

massive

25

0.586854

first

25

0.586854

presidential

23

0.539906

Below is the list of top 20 adjectives from @realDonaldTrump, @HillaryClinton, @CNN, @FoxNews from their last 3200 tweets.

General Sentiment of Tweets

To calculate the sentiment of the tweets, I used a function in the nltk library called demo_liu_hu_lexicon() which classifies eachword of the sentence as Positive, Negative, or Neutral, and then does a basic count of each word-classification category. Whichever group has the highest count is how the text will get assigned. There are definitely better ways to do this. I considered an integration with IBM’s Watson, but time was doing its thing, being time, and being of the essence and such.

@realDonaldTrump

@HillaryClinton

@CNN

@FoxNews

Positive

0.463750

0.468750

0.304688

0.235625

Neutral

0.285313

0.401875

0.425938

0.455937

Negative

0.250937

0.129375

0.269375

0.308437

Top Hashtags

This section contains a top 5 hashtag summary table for the entire analysis group, and also has the top 20 hashtags for each account listed afterward. You can infer what you will from this data.

I did find it interesting that the top-used hashtag by Trump was one of self-promotion, and Hillary Clinton used a lot of hashtags relating to the debates. Just looking at the hashtag data makes me think that Trump’s social media strategy was much stronger throughout the campaign.

In addition, all four Twitter accounts had at least one hashtag with “Trump” in it. From a marketing perspective, that’s good brand awareness.

Peters: “Without the least exaggeration, we can say that President Obama has been the worst foreign policy presiden\u2026 https://t.co/JbIGCpwKUk

10263

Sat Dec 31 03:32:46 +0000 2016

.@realDonaldTrump: “Michelle Obama said yesterday that there\’s no hope, but I assume she was talking about the past\u2026 https://t.co/V1BuKztapK

10021

Sat Dec 17 22:43:31 +0000 2016

.@realDonaldTrump: “We have to protect Israel. Israel, to me, is very, very important. We have to protect Israel.” https://t.co/R8CZWsGfvX

10021

Sun Jan 01 03:26:17 +0000 2017

Giuliani: \u201cThe U.S. Constitution doesn\u2019t give anyone in this world the right to come to the U.S. That\u2019s a privilege\u2026 https://t.co/TyKS7REAPR

8885

Wed Dec 21 03:41:24 +0000 2016

.@KatrinaPierson: This president is the divider-in-chief. His entire political career revolved around racism, sexis\u2026 https://t.co/vQP27Mxc1B

8700

Fri Dec 30 01:43:04 +0000 2016

.@GovMikeHuckabee: Can you name me one Muslim country that welcomes Christians to build &amp; protect churches? No, you\u2026 https://t.co/baLyxGAkL6

8419

Thu Dec 29 01:39:42 +0000 2016

DJT: “I think the Democrats are putting it out because they suffered 1 of the greatest defeats in the history of po\u2026 https://t.co/2bBtHDwqu3

8175

Sun Dec 11 19:08:47 +0000 2016

About the Data

If you would like to access some of the code/data, it is publicly available on my GitHub repo. I’ve also included all four files that contain all of the tweets on which I ran the analysis in the data/ directory.

I had plans to include Date/Time data analysis in this post and many other things (if you’d like to see more data, let me know), but you have to stop somewhere great!!!!!! #MakeCodingProjectsSmallAgain