Follow me on my journey to becoming a Data Scientist

Scraping Tweets from Twitter

Part One: Extracting Account Tweets and Tweets related to a Hashtag

In this post, I am going to show you how to extract tweets from twitter! Specifically the tweets related to a certain account (or twitter handle) and tweets related to a hashtag. Why might you extract this information from twitter at all?

Webscraping, scraping text from the web, automates the process of manually typing or copying and pasting information that we want from online sources. This could be regular websites or social media sites like Twitter. Social media supplies companies with direct reviews about their products and services, providing the best feedback for improvements.

In order to scrape twitter data, there is actually more work that goes into setting up everything, than actually extracting the tweets. So, let’s take this one step at a time.

Step One: Install and Load Your Packages

Step 1.a. There are several packages that will make the process of scraping tweets from Twitter much easier. The first step is to install the packages below:

Step Two: Connecting R to Twitter

This next part is little tricky; you to connect you R with the Twitter API. The Twitter API or application programming interface, is what allows you to access Twitter’s data. In order to connect to the Twitter API, you first need to create a Twitter App.

As you can see from my screenshots above, in honor of being in Paris today, I will be extracting tweets from the @ParisJeTaime account for the official Paris Tourist Office. I simply named my app ParisJeTaime Tweets, with a description that I will be extracting tweets. The application also asks for a website URL. I used my blog, since I will be publishing the data on here, but feel free to use any website as a filler in case you don’t have a blog to use for yourself. Lastly, I provided information again of how the app will be used.

Step 2.b. Once generated, head over to the “Keys and tokens” tab for your app (pictured below) where you will find the keys and tokens necessary to connect R, or RStudio, with the Twitter API.

Step 2.c. With all of your access keys and tokens, head back into R so you can establish a connection with Twitter. I recommend setting each of your keys and tokens to a variable; this way it will be easier to manage your code:

Step Three: Scraping Tweets

The code for scraping tweets is fairly simple.

Step 3.a. If you want to scrape tweets from user or account timeline, you would need to use userTimeline() function. The code below shows how I used this function on the @ParisJeTaime account to extract their last 100 tweets.

ParisJeTaime <- userTimeline("ParisJeTaime", n = 100)

You can turn this data into a data frame with tbl_df(map_df()) functions. Let’s what this code and data frame looks like:

As you can see from the data frame above, this data will need some cleaning before any analysis can be done on it. This we will be covering in future posts, but at least we have scraped some Twitter data!

Lastly, you may want to save your data frame as a file for later use. Using the code below will save your data frame to a CSV in your R directory path folder:

write.csv(ParisJeTaime_df, "ParisJeTaime.csv")

Step 3.b. If you want to scrape tweets with a certain hashtag, you will need to use searchTwitter() function. The code below shows how I used this function to extract that latest 100 tweets with #parisjetaime: