How to connect to an API and parse XML (and why you would want to)

Many beginner programmers see the acronym API all over the place. Why are API's everywhere? What do you do with them? How do they work? At the same time, many beginner programmers see or encounter XML. Why is XML everywhere? How do you turn XML into the integers or strings that I know how to deal with? These are excellent questions that aspiring programmers may ask themselves. For me it was difficult to grasp the big picture and see exactly why these two acronyms were talked about so often in the programming world. In this article I'll explain what API's are, why XML is so often associated with them, and at the end give a short example of how to "connect" to an API, grab some XML from it, and parse it to turn it into the integers or strings that you probably know how to manipulate on a regular basis.

So what is an API (besides an Application Programming Interface)?

Imagine you worked for a large company named Word Co. that organized words, specifically English language words. Perhaps your company scanned a bunch of textbooks and collected all of the words, counted the words, and created a big database full of useful information related to words. Basically you have a big set of information and one day your company (Word Co.) decides it wants to make all of that data available for other companies or allow individuals to see or access it. What are your options?

Give people the actual database

Make a website that pulls from the database

Make an API that allows programmers to interact with the database

The first option is probably not a good one because the database can be huge (potentially gigabytes or terabytes of information), you may be using a proprietary database (such as Google's BigTable) or software, or maybe you just spent millions of dollars collecting this information and you want to charge people for accessing it.

The second option may be a really neat idea but might not work if you wanted a mobile device or app to access it, or if you wanted to present the information in a different way other than a chart or web form. Imagine if someone wanted to make a Hangman game where you try to guess a random word (maybe a random word that was pulled from the big database of English language words) before a stick figure is "hung". This is something the website cannot directly perform.

An API allows people to grab information (or use services) that are part of a huge data set in ways that might not be imagined by the people who created that large data set. If Word Co. organized English words and created an API to access those words, let's take a minute to imagine what others can create with it:

Which are all applications or tools that Word Co. doesn't have the time or desire to create. API's are usually intended to allow third parties to create awesome things using existing data that a company has already harvested and collected. What are some other services that might have API's?

Weather services usually have API's

Google has a ton of API's (like their Maps API, their search engine, and just about everything else)

Facebook allows third-party developers to interact with the Facebook data

So how do I "connect to" or use an API?

Although many API's are different, it often boils down to making a request and getting some data. Some API's give you a bunch of code or libraries that you add to your project, and then use that code to make the requests, but many other API's are quite simple. If you are new to programming, I'd suggest looking for RESTor so called "RESTful" API's. Other ways to access API's such as SOAP also exist, but in my opinion are a little harder to get started with. Fortunately many API's that used to be SOAP based are now REST based. Let's outline how you would use a typical REST based API:

Make an HTTP request to a web server. Usually you'll include a variable or two that is passed in through the URL

Get some data back (typically XML)

Parse the XML (the XML is just a big character stream and you'll want to grab certain pieces of it and turn it into other data types or create an object)

Use that data to do neat things! (Like create a Hangman game with a random word you just grabbed)

Notice that the data that comes back from an API is typically XML. Why XML? Because it's a great intermediary "language". Imagine if you wrote your Hangman game in Java and the Random Word API gave you Python code back. That wouldn't be very useful. Or if you wrote something in C/C++ and an API gave you a serialized Java object.

What makes XML so popular (especially with API's) is that it allows you to use whichever language you want, and gives you data is that both human readable and computer readable. Just about any programming language comes with standard libraries to parse XML quickly and easily. If you're an advanced programmer, it also allows you to build objects or data structures (like if you're dealing with A TON of data) exactly how you want them instead of forcing you to accept whatever the API gives you.

A concrete example in Java

Let's make something! Imagine you wanted to create your own Android weather app. Since we aren't meteorologists, we'll get all of the weather information from someone else-- Google's Weather API. Other options are the National Weather Service (in the U.S.) or maybe Weather Underground. Most of the API's out there are well documented and tell you how you should connect, use, or interface with them. Google's Weather API is a little weird in that there is no documentation. I think it's sort of a secret API. But here's how you use it:

Make an HTTP request to http://www.google.com/ig/api?weather=Location where Location is whatever you want (A postal code or city).

That's it! You'll get a bunch of XML back with the current weather and forecast information. You can even try it out in your web browser (since your web browser makes HTTP requests on a very regular basis). Let's see what happens when we use Seattle WA as an example (from http://www.google.com/ig/api?weather=Seattle+WA):

And let's imagine we want to extract the highs and lows in this XML so we can use them in our Android weather app. As mentioned, many programming languages have built in libraries that allow you to parse the XML. Since XML is so popular, there are even multiple approaches to parsing it, even within a given language. Java has both a DOM parser and a SAX parser built in. Python also has a DOM parser and a SAX parser built in. What are DOM and SAX parsers?

SAX(Simple API for XML) parsers are stream oriented parsers and typically use less memory and are faster

DOM(Document Object Model) parsers are tree traversal parsers and can consume more memory if you're dealing with large amounts of XML

When should you use one over the other? When you are dealing with HUGE amounts of data. Most of the time (such as right now) you don't need to worry and can use whichever one you're comfortable with. I'll be using the Java SAX parser in this example.

Remember the steps to do this? 1) Make an HTTP request to the API, typically passing in a URL variable, 2) Get the data back and then parse it, and finally 3) Do neat things! Let's see what that looks like in Java code:

So at this point we have some simple Java code that connects to the Google Weather API and receives some data back. In the above case, we are getting our data (the XML) in the form of an InputStream. In other languages you'll still probably be receiving the data as a stream. Streams and I/O are a pretty big part of programming, so if you're not sure how to work with these, now is a good time to start. Anyways, we now need to set up the XML parser. As mentioned I am picking the SAX parser for this example, and as the SAX parser explains on its website, you need to create a handler for handling the XML. In other words, you need to tell it what to do when it encounters specific parts of the XML. In this case we'll look for <low>, <high>, and <day_of_week> tags. To define this behavior we'll extend SAX's DefaultHandler (meaning we give it more functionality than the default functionality). Let's see what this looks like:

In the GoogleHandler there are System.out.println() commands, but it also adds the integers and strings into their own array lists which you can now access in a more familiar way (such as calling days.get(0) to get the first day of the week in that array list).

A concrete example in Python 3

And finally let's take a quick look at how to do this in Python, again using a SAX parser. As you can see, Python does quite a bit of heavy lifting for you (such as making the HTTP request and getting the XML -- which is one line of code). Go ahead and copy/modify this code for any of your projects. It was built and tested with Python 3.2.2 in October 2011.

I hope this tutorial was helpful. If you have questions please ask away. I'll also add that our fictional Word Co. (as mentioned at the top of this article) API isn't just a made up concept to explain API's. It actually exists!