This article directly follows "Exploring the Wikipedia JSON API", except that instead of viewing the Wikipedia API via the web browser (i.e. via human-friendly graphical means), it demonstrates how to do the same thing in Python code. If "doing the same thing, but in Python (or in any programming language)" sounds like masochism, it's because it is masochism. So this guide will also cover how to leverage loops and functions to do efficient, scalable data mining, which is generally the main purpose of programming.

titles=Stanford%20University (%20 is how a space character is represented in a URL)

If writing out this out seemed painfully tedious and hard on the eyes, you're right: that's why API stands for application programming interface. It's not meant for humans to read or to write, but it's relatively easy for a program to consume.

Turning a Python dictionary into a URL

The most immediate annoyance of accessing the Wikipedia Query API is writing out all of those key-value pairs in the URL, i.e.

?action=query&prop=info&format=json&titles=Stanford%20University

One nice feature of the Requests library is that these attributes can be represented as a Python dictionary. In the following example, I'm showing how each key-pair in the dictionary corresponds to the URL string:

One thing to note: In the dictionary, I pass the string value of "Stanford University" to the titles key, instead of "Stanford%20University – the Requests library will take care of that conversion for us later.

Passing parameters in URLs

Recall that the basic usage of the Requests get method looks like this:

Note that the ordering of key-value pairs in the URL (and in Python dictionaries), does not matter.

Joining lists of strings

Some of the parameters can take in multiple values. For example, this following snippet:

inprop=protection|watchers

– specifies that we want the additional "info properties" of protectionandwatchers. It's pretty easy to write out a list of 2 items, so let's just add that to the my_atts dictionary in our previous example:

my_atts['inprop']='protection|watchers'

The titles parameter can also take multiple pipe-separated values. To get the data for Stanford University and Palo Alto, here's the code re-written from the beginning, with the tedious dictionary-assignments condensed into one line:

But what if we want the info data for all of the Ivy League schools? There's only 8 of them, but typing those in and carefully separating them with pipe characters is extremely tedious. And look at just how ugly a list of 8-items can turn into:

This is definitely a repetitive problem that can be helped by programming. If you think of the school names as being a list of strings, then we use the str.join() method to join the strings together, using whatever the value of str is as the delimiter:

Writing functions

Even if you're copying-and-pasting the code, doing multiple Wikipedia API requests is going to be tedious. At the very least, we can simplify the amount of code that needs to be copy-and-pasted by writing a function (I sloppily interchange the word function with method, though for our purposes, the two are very close in meaning).

If you think of variables as a way to store values that you want to reuse later, think of functions as a similar mechanism for storing a set of commands for later use. Functions are a bit different than values, though, in that they are meant to be executed. Think of the humble print function, which takes in any number of arguments and prints them to screen:

a=42b="Apples"print(a,b)# 42 Apples

Defining functions has its own syntax: here's a couple of guides:

http://anh.cs.luc.edu/python/hands-on/3.1/handsonHtml/functions.html

http://learnpythonthehardway.org/book/ex18.html

Basic get_wikipedia_basic_info() function

And here's one way to define a function that wraps up all the code we've written to call the Wikipedia basic info API:

Refined get_wikipedia_basic_info() function

Recall that it was a pain to manually generate that pipe-delimited string of titles. So let's give the user the option to pass in a list of titles, and we'll take care of the details of joining them together:

Get All The Schools

Now that we've seen how writing a function can encapsulate some relatively complex code, it should be even more apparent that fetching the data for "Stanford University" is pretty much the same as it is for "Harvard University" or for any Wikipedia page. Moreover, the computer doesn't care if we want just one page or, theoretically, 1000 pages. And this is where the power of programming starts to really shine.

You're going to get an error. The debug message is hard to trace (since our function doesn't have any fallback for an error situation), but basically, Wikipedia only allows you to retrieve 50 titles at a time.