Python 101: How to Grab Data from RottenTomatoes

Today we’ll be looking at how to acquire data from the popular movie site, Rotten Tomatoes. To follow along, you’ll want to sign up for an API key here. When you get your key, make a note of your usage limit, if there is one. You don’t want to do too many calls to their API or you may get your key revoked. Finally, it’s always a very good idea to read the documentation of the API you will be using. Here are a couple of links:

Once you’ve perused that or decided that you’ll save it for later, we’ll continue our journey.

Starting the Show

Rotten Tomatoes’ API provides a set of json feeds that we can extract data from. We’ll be using requests and simplejson to pull the data down and process it. Let’s write a little script that can get the currently playing movies.

In the code above, we build a URL using our API key and use requests to download the feed. Then we load the data into simplejson which returns a nested Python dictionary. Next we loop over the movies dictionary and print out each movie’s title. Now we’re ready to create a function to extract additional information from Rotten Tomatoes about each of these movies.

This new code pulls out a lot of data about each of the movies, but the json feeds contains quite a bit more that is not shown in this example. You can see what you’re missing out on by just printing the js dictionary to stdout or you can see an example json feed on the Rotten Tomatoes docs page. If you’ve been paying close attention, you’ll notice that the Rotten Tomatoes API doesn’t cover a lot of the data on their website. For example, there is no way to pull actor information itself. For example, if we wanted to know what movies Jim Carrey was in, there is no URL endpoint to query against. You also cannot look up anyone else in the cast, such as the director or producer. The information is on the website, but is not exposed by the API. For that, we would have to turn to the Internet Movie Database (IMDB), but that will be the topic of a different article.

Let’s spend some time improving this example. One simple improvement would be to put the API key into a config file. Another would be to actually store the information we’re downloading into a database. A third improvement would be to add some code that checks if we’ve already downloaded today’s current movies because there really isn’t a good reason to download today’s releases more than once a day. Let’s add those features!

Adding a Config File

I prefer and recommend ConfigObj for dealing with config files. Let’s create a simple “config.ini” file with the following contents:

api_key = API KEY
last_downloaded =

Now let’s change our code to import ConfigObj and change the getInTheaterMovies function to use it:

As you can see, we import configobj and pass it our filename. You could also pass it the fully qualified path. Next we pull out the value of api_key and use it in our URL. Since we have a last_downloaded value in our config, we should go ahead and add that to our code so we can prevent downloading the data multiple times a day.

Here we import Python’s datetime module and use it to get today’s date in the following format: YYYYMMDD. Next we check if the config file’s last_downloaded value equals today’s date. If it does, we do nothing. However, if they don’t match, we set last_downloaded to today’s date and then we download the movie data. Now we’re ready to learn how to save the data to a database.

Saving the Data with SQLite

Python has supported SQLite natively since version 2.5, so unless you’re using a really old version of Python, you should be able to follow along with this part of the article without any problems. Basically, we just need to add a function that can create a database and save our data into it. Here is the function:

This code first checks to see if the database file already exists. If it does not, then it will create the database along with 3 tables. Otherwise the saveData function will create a connection and a cursor object. Next it will insert the data using the movie dictionary that is passed to it. We’ll call this function and pass the movie dictionary from the getMovieDetails function. Finally, we will commit the data to the database and close the connection.

If you use Firefox, there’s a fun plugin called SQLite Manager that you can use to visualize the database that we’ve created. Here is a screenshot of what was produced at the time of writing:

Wrapping Up

There are still lots of things that should be added. For example, we need some code in the getInTheaterMovies function that will load the details from the database if we’ve already got the current data. We also need to add some logic to the database to prevent us from adding the same actor or movie multiple times. It would be nice if we had some kind of GUI or web interface as well. These are all things you can add as a fun little exercise.

By the way, this article was inspired by the Real Python for the Web book by Michael Herman. It has lots of neat ideas and samples in it. You can check it out here.

I don’t know, actually. You’ll want to read their API documentation to find out. I don’t think it had that capability when I wrote this article, but that was nearly two years ago, so it has hopefully changed.