An online community for showcasing R & Python tutorials. It operates as a networking platform for data scientists to promote their talent and get hired. Our mission is to empower data scientists by bridging the gap between talent and opportunity.

Sentiment Analysis on Donald Trump using R and Tableau

Recently, the presidential candidate Donal Trump has become controversial. Particularly, associated with his provocative call to temporarily bar Muslims from entering the US, he has faced strong criticism.
Some of the many uses of social media analytics is sentiment analysis where we evaluate whether posts on a specific issue are positive or negative.
We can integrate R and Tableau for text data mining in social media analytics, machine learning, predictive modeling, etc., by taking advantage of the numerous R packages and compelling Tableau visualizations.

In this post, let’s mine tweets and analyze their sentiment using R. We will use Tableau to visualize our results. We will see spatial-temporal distribution of tweets, cities and states with top number of tweets and we will also map the sentiment of the tweets. This will help us to see in which areas his comments are accepted as positive and where they are perceived as negative.

We see from the word cloud that among the most frequent words in the tweets are ‘muslim’, ‘muslims’, ‘ban’. This suggests that most tweets were on Trump’s recent idea of temporarily banning Muslims from entering the US.

The dashboard below shows time series of the number of tweets scraped. We can change the time unit between hour and day and the dashboard will change based on the selected time unit. Pattern of number of tweets over time helps us to drill in and see how each activities/campaigns are being perceived.

Let’s get full address of each tweet location using the google maps API. The ggmaps package is what enables us to get the street address, city, zipcode and state of the tweets using the longitude and latitude of the tweets. Since the google maps API does not allow more than 2500 queries per day, I used a couple of machines to reverse geocode the latitude/longitude information in a full address. However, I was not lucky enough to reverse geocode all of the tweets I scraped. So, in the following visualizations, I am showing only some percentage of the tweets I scraped that I was able to reverse geocode.

hist(score,xlab=" ",main="Sentiment of sample tweets\n that have Donald Trump in them ",
border="black",col="skyblue")

Here is the plot:

We see from the histogram that the sentiment is slightly positive. Using Tableau, we will see the spatial distribution of the sentiment scores.

Save the data as csv file and import it to Tableau

The map below shows the tweets that I was able to reverse geocode. The size is proportional to the number of favorites each tweet got. In the interactive map, we can hover over each circle and read the tweet, the address it was tweeted from, and the date and time it was posted.

Similarly, the dashboard below shows the tweets and the size is proportional to the number of times each tweet was retweeted.
Here is the screenshot (View it live in this link)

In the following three visualizations, top zip codes, cities and states by the number of tweets are shown. In the interactive map, we can change the number of zip codes, cities and states to display by using the scrollbars shown in each viz. These visualizations help us to see the distribution of the tweets by zip code, city and state.

Sentiment of tweets

Sentiment analysis has myriads of uses. For example, a company may investigate what customers like most about the company’s product, and what are the issues the customers are not satisfied with? When a company releases a new product, has the product been perceived positively or negatively? How does the sentiment of the customers vary across space and time? In this post, we are evaluating, the sentiment of tweets that we scraped on Donald Trump.

The viz below shows the sentiment score of the reverse geocoded tweets by state. We see that the tweets have highest positive sentiment in NY, NC and Tx.
Here is the screenshot (View it live in this link)

Summary

In this post, we saw how to integrate R and Tableau for text mining, sentiment analysis and visualization. Using these tools together enables us to answer detailed questions.

We used a sample from the most recent tweets that contain Donald Trump and since I was not able to reverse geocode all the tweets I scraped because of the constraint imposed by google maps API, we just used about 6000 tweets. The average sentiment is slightly above zero. Some states show strong positive sentiment. However, statistically speaking, to make robust conclusions, mining ample size sample data is important.

The accuracy of our sentiment analysis depends on how fully the words in the the tweets are included in the lexicon. Moreover, since tweets may contain slang, jargon and collequial words which may not be included in the lexicon, sentiment analysis needs careful evaluation.

This is enough for today. I hope you enjoyed it! If you have any questions or feedback, feel free to leave a comment.

Hi Friends, this is a very inspiring article as it combines R and Tableau in the analysis.
When I tried doing a similar analysis I had problems with the following functions getLatitude(), getLongitude() , getCreated() etc.
I would really appreciate any direction of fixing the error, thanks.

Marc

Excellent article! When I try to do it however I do no get ANY long and lat values? Any thoughts where this might go wrong?

Abdi Adan

hi friends
when i tried the codes,i got the message in the screenshot attached.
what does it mean?please help me find a solution
thank you

Juliet Glidden

Hi! I keep getting the below error. Every other step works. I’m using my own twitter data loaded as a df. Could there be an issue there?

prateek

Hi, great article. Just an aside though, might come in handy. the twListToDF function easily converts the tweets data into a handy data frame.! 🙂

Pavan Nayakanti

aghh…it saves lots of code lines. sad that i realized it too late. anyways, thanks for the tip.

Pavan Nayakanti

Hello Fissheha, Can you please help me to identify the script which helps to save the data to csv file. Thanks.

Pavan Nayakanti

Hai Fissheha, while trying to gather the address of the tweets, am seeing the below errors.
> data=filter(data, !is.na(lats),!is.na(lons))
Error in match.arg(method) : ‘arg’ must be NULL or a character vector
> lonlat=select(data,lon,lat)
Error: could not find function “select”

Please advise.

Pavan Nayakanti

Hi, Another challenge am facing with the unclean inputs. Please suggest if there is any smart way to do this. Tried few online suggestions but they ended up not much help.

Thanks for the prompt response. I realized that I just need to enable the ROAuth for this. Thanks again.

library(ROAuth)

Ryan Wesslen

Can you explain these two lines of the code:
donaldtext=sapply(donald, function(x) x$getText())
donaldtext=unlist(donaldtext)

More specifically, where is the getText() function defined?

I’m getting a blank result. Trying to figure out what my problem is. Everything else works before this step and all the other fields (e.g., donaldlat, donalddate, retweeted, etc.) are being populated just not donaldtext.

Could you please explain the problem and tell me how to solve it?
Thanks in advance!

Andrew Borg

you’d need to set up a twitter dev account mate

Patty

Thanksi Could you please explain how to do it? What is the site web?

Shivi

Great Show of R and Tableau. Cool insights with the sentiments.
Just a request if possible i know it would sound asking for too much: whenever some analysis is posted would it be possible to add details/comments what the codes are doing as more and more ppl are moving to R and find it quite difficult to understand just lke i did.

Thank you for your suggestion. As mentioned in the summary, to give a robust conclusion, scrapping ample size tweets is necessary. The tweets used here are a small percentage of the tweets in one week. The aim of this blog post is to show the steps, but by collecting enough size sample tweets and by using a good lexicon, we can get robust insights from social media mining.