Accessing the Twitter API with Python

Introduction

One thing that Python developers enjoy is surely the huge number of resources developed by its big community. Python-built application programming interfaces (APIs) are a common thing for web sites. It's hard to imagine that any popular web service will not have created a Python API library to facilitate the access to its services. A few ideas of such APIs for some of the most popular web services could be found here. In fact, "Python wrapper" is a more correct term than "Python API", because a web API would usually provide a general application programming interface, while programming language-specific libraries create code to "wrap" around it into easy to use functions. Anyway, we'll use both terms interchangeably throughout this article.

In this blog post we concentrate on the Twitter API, show how the setting up of your credentials goes with Twitter, and compare a few Python wrappers based on the community engagement. Then we show a few examples of using the Twitter API for searching tweets, and creating a stream of realtime tweets on a particular subject. Finally, we'll explore the saved data.

An Overview of the Twitter API

There are many APIs on the Twitter platform that software developers can engage with, with the ultimate possibility to create fully automated systems which will interact with Twitter. While this feature could benefit companies by drawing insights from Twitter data, it's also suitable for smaller-scale projects, research, and fun. Here are a few of the most notable APIs provided by Twitter:

There are many more possibilities with the Twitter APIs, which are not included in this list. Twitter is also constantly expanding its range of services by adding new APIs from time to time, and updating existing ones.

Getting Credentials

Before using the Twitter API, you first need a Twitter account, and to have obtained some credentials. The process of getting credentials could change with time, but currently it is as follows:

Click on the "Create New App" button, fill in the details and agree the Terms of Service

Navigate to "Keys and Access Tokens" section and take a note of your Consumer Key and Secret

In the same section click on "Create my access token" button

Take note of your Access Token and Access Token Secret

And that's all. The consumer key/secret is used to authenticate the app that is using the Twitter API, while the access token/secret authenticates the user. All of these parameters should be treated as passwords, and should not be included in your code in plain text. One suitable way is to store them in a JSON file "twitter_credentials.json" and load these values from your code when needed.

Python Wrappers

Python is one of the programming languages with the biggest number of developed wrappers for Twitter API. Therefore, it's hard to compare them if you haven't used each of them for some time. Possibly a good way to choose the right tool is to dig into their documentation and look at the possibilities they offer, and how they fit with the specifics of your app. In this part, we'll compare the various API wrappers using the engagement of the Python community in their GitHub projects. A few suitable metrics for comparison would be: number of contributors, number of received stars, number of watchers, library's maturity in timespan since first release etc.

Table 1: Python libraries for Twitter API ordered by number of received stars.

The above table listed some of the most popular Python libraries for the Twitter API. Now let's use one of them to search through tweets, get some data, and explore.

Twython Examples

We've selected the twython library because of its diverse features aligned with different Twitter APIs, its maturity - although there's no information when its first release was published, there's information that version 2.6.0 appeared around 5 years ago, and its support for streaming tweets. In our first example we'll use the Search API to search tweets containing the string "learn python", and later on we'll show a more realistic example using Twitter's Streaming API.

Search API

In this example we'll create a query for the Search API with a search keyword "learn python", which would return the most popular public tweets in the past 7 days. Note that since our keyword is composed of two words, "learn" and "python", they both need to appear in the text of the tweet, and not necessarily as a continuous phrase. First, let's install the library. The easiest way is using pip, but other options are also listed in the installation docs.

$ pip install twython

In the next step, we'll import the Twython class, instantiate an object of it, and create our search query. We'll use only four arguments in the query: q, result_type, count and lang, respectively for the search keyword, type, count, and language of results. Twitter also defines other arguments to fine-tune the search query, which can be found here.

Finally we can use our Twython object to call the search method, which returns a dictionary of search_metadata and statuses - the queried results. We'll only look at the statuses part, and save a portion of all information in a pandas dataframe, to present it in a table.

So we got some interesting tweets. Note that these are the most popular tweets containing the words "learn" and "python" in the past 7 days. To explore data back in history, you'll need to purchase the Premium or Enterprise plan of the Search API.

Streaming API

While the previous example showed a one-off search, a more interesting case would be to collect a stream of tweets. This is done using the Twitter Streaming API, and Twython has an easy way to do it through the TwythonStreamer class. We'll need to define a class MyStreamer that inherits TwythonStreamer and then override the on_success and on_error methods, as follows.

The on_success method is called automatically when twitter sends us data, while the on_error whenever a problem occurs with the API (most commonly due to constraints of the Twitter APIs). The added method save_to_csv is a useful way to store tweets to file.

Similar to the previous example, we won't save all the data in a tweet, but only the fields we are interested in, such as: hashtags used, user name, user's location, and the text of the tweet itself. There's a lot of interesting information in a tweet, so feel free to experiment with it. Note that we'll store the tweet location as present on the user's profile, which might not correspond to the current or real location of the user sending the tweet. This is because only a small portion of Twitter users provide their current location - usually in the coordinates key of the tweet data.

The next thing to do is instantiate an object of the MyStreamer class with our credentials passed as arguments, and we'll use the filter method to only collect tweets we're interested in. We'll create our filter with the track argument which provides the filter keywords, in our case "python". Besides the track argument, there are more possibilities to fine-tune your filter, listed in the basic streaming parameters, such as: collecting tweets from selected users, languages, locations etc. The paid versions of the Streaming API would provide much more filtering options.

With the code above, we collected data for around 10,000 tweets containing the keyword "python". In the next part, we'll do a brief analysis of the included hashtags and user locations.

Brief Data Analysis

The Twitter API is a powerful thing, very suitable for researching the public opinion, market analysis, quick access to news, and other use-cases your creativity can support. A common thing to do, after you've carefully collected your tweets, is to analyse the data, where sentiment analysis plays a crucial role in systematically extracting subjective information from text. Anyway, sentiment analysis a huge field to be addressed in a small portion of a blog post, so in this part we'll only do some basic data analysis regarding the location and hashtags used by people tweeting "python".

Please note that the point of these examples is just to show what the Twitter API data could be used for - our small sample of tweets should not be used in inferring conclusions, because it's not a good representative of the whole population of tweets, nor its collection times were independent and uniform.

First let's import our data from the "saved_tweets.csv" file and print out a few rows.

What are the most common hashtags that go with our keyword "python"? Since all the data in our DataFrame are represented as strings including brackets in the hashtags column, to get a list of hashtags we'll need to go from a list of strings, to a list of lists, to a list of hashtags. Then we'll use the Counter class to count the hashtags entries in our list, and print a sorted list of 20 most common hashtags.

Next, we can use the user location to answer - which areas of the world tweet most about "python"? For this step, we'll use the geocode method of the geopy library which returns the coordinates of a given input location. To visualise a world heatmap of tweets, we'll use the gmplot library. A reminder: our small data is not a real representative of the world.

The above code produced the heatmap in the following figure, showing a higher activity in "python" tweets in US, UK, Nigeria and India. One downside of the described approach is that we didn't do any data cleaning; there turned out to be many machine generated tweets coming from a single location, or multiple locations producing one same tweet. Of course these samples should be discarded, to get more realistic picture of the geographical distribution of humans tweeting "python". A second improvement would simply be to collect more data over longer and uninterrupted periods.

Conclusions

In this blog post we presented a pretty modest part of the Twitter API. Overall, Twitter is a very powerful tool for understanding the public opinion, doing research and market analysis, and therefore its APIs are a great way for businesses to create automated tools for drawing insights related to their scope of work. Not only businesses, but individuals could also use the APIs for building creative apps.

We also listed a few of the most popular Python wrappers, but it's important to note that different wrappers implement different possibilities of the Twitter APIs. Therefore one should choose a Python wrapper according to its purpose. The two examples we showed with the Search and Streaming APIs, briefly described the process of collecting tweets, and some of the possible insights they could draw. Feel free to create ones yourself!