Developing RESTful Web Services in Perl

If you are a web developer and aren't familiar with the term "REST," you should know that you regularly work within this software architecture. REST is a term that can describe the interactions that occur between client and server on the World Wide Web. However, as with many terms, its original definition actually takes on further meaning as different parties have used it. In this paper, I assume REST to be more specifically applied to describe communication between a client and server as part of a specific API.

I am presenting a guide to you here on how to get started developing your own RESTful API. This guide has been split into two parts. In this part, I provide a very brief description of REST as it applies to this guide. I will then describe how to build your very own RESTful server using a CGI script (which should be easily extrapolated into FastCGI or mod_perl or another web framework). In the next article, we'll go over how to access this server from a RESTful client written using libwww-perl. I will end with a couple useful extensions that should be considered, and I will share some other important resources.

What is REST?

In the most generic terms, REST (Representational State Transfer) is a software architecture originally published by Roy Fielding in his dissertation. More specifically, this term has been used to define web service APIs for the management of resources that may be created, read, updated, and deleted (CRUD) over HTTP. This is the focus of this article. RESTful APIs are used by many big names. Off the top of my head, I know that Amazon (Amazon Web Services), Intuit (Quickbase), and Facebook all provide RESTful interfaces to their applications. There are many others.

We will spend a lot of time discussing resources in this article. A resource in a RESTful web service is just some unit of data useful to your site. This is probably a record in your SQL database, but it could be an account on an LDAP server, a segment of an XML data file, or just about any other unit of data you want to share with others. The sample server, for example, will be reading from and writing to files on the disk.

RESTful Principles

Before I talk about some principles, I should state the disclaimer that I am coming at REST from a completely practical background. REST is a software architectural style and, as such, it has a lot of papers and theory and purists attached to it. This guide is about getting stuff done in Perl. If I fail to present a pure message on REST itself, I will only excuse myself by saying that I never claimed to do so. See the resources in the RESTful Resources section if you want more information on REST as an architecture.

With that out of the way, let's talk about some principals. You won't get fair into literature on REST without running into the "REST Triangle" (see Figure 1 below). Essentially, there are three key concepts to REST: nouns, verbs, and content types. I will discuss each briefly here.

Figure 1. The REST Triangle

Nouns: Know Your URIs

A noun is an identifier for a resource. This is generally a URL (link to GET the resource) when we talk about REST in HTTP. It might also be a URN (a name for a resource that can be used via HTTP or something else to identify the resource) or another kind of URI (URLs and URNs are often URIs when used as REST nouns). You probably want nouns that uniquely identify (a URI: Unique Resource Identifier) your resource or at least one noun that uniquely identifies the resource, but you might provide nouns that are not unique. For example, I might have an interface identifying the same record with the following nouns:

http://example.com/=/model/person/id/157

http://example.com/=/model/person/last_name/Hanenkamp

http://example.com/=/model/person/irc_name/zostay@irc.freenode.net

Each of these might be URLs in my interface. The first might be the unique ID of my account on the system, a unique ID. The next example uses my last name, which is pretty unique in the United States, but not totally so and certainly not unique worldwide. The last example is an example of a noun that is unique according to an external authority. The choice of how you identify your nouns is something to consider.

For a more detailed treatment of nouns and URLs, you may want to read more about URL Construction.

Verbs: Know Your CRUD

CRUD is an acronym referring to the common changes made to data: Create, Read, Update, and Delete. This set of operations generally encompasses everything that can be done to a piece of data. When we talk about these operations within the context of REST, we will use specific HTTP request methods to implement each. In REST nomenclature, these are called the verbs of the architecture.

GET

A GET request is used to perform a read operation. This will be used to return the content of your resource.

POST

A POST request is used to perform a create operation. A POST will create the resource on the server and assign a noun to it.

PUT

A PUT request is used to perform an update operation. A PUT operation performs the opposite of GET: it updates the resource on the server when the client pushes content to the server.

DELETE

A DELETE request is used to perform a delete operation (whoa, deep). That's it.

In theory, you don't really need all of these in every RESTful web service. If you don't allow modification of resources, you can just use GET. A simple REST interface might provide only GET and PUT or GET and POST, depending on your needs. These are not the only available verbs either.

Content Types: Know Your MIME

The final piece of the triangle is the content type of your resources. The content types provide the format for the data that will take part in your RESTful discussion. You will specify these with the "Content-Type" header in the requests (client-side) and responses (server-side). When traveling on the information superhighway of the World Wide Web, you are pretty constrained to using some variant of HTML as the main document type. In a web service, however, you can use whatever suits your application.

The format you want to use will depend explicitly upon the needs of your application. If you are exchanging organized data, like the sample server and client included with this article, you will probably want a data interchange format like XML, YAML, JSON, or CSV. If your application deals with documents, you will probably want to use a document format related to that, such as HTML, DocBook, SGML, ODF, PDF, PostScript, etc. Your application might manipulate photos (JPG, PNG, BMP) or calendar information (iCal) or categorized links (OPML) or whatever else. You can use microformats or whatever you happen to like.

If you want to be really cool, you can even permit the data to be described in multiple formats. For example, you might allow updates to your data to come as XML, YAML, and JSON by examining the "Content-Type" header sent in the request and treating the given data accordingly. You can allow the client to request that data back in a custom format by examining the "Accept" header and choosing a format based on the client's preference. Ultimately, if your data can be requested and posted in formats that are convenient to your clients, you will probably have happier clients.

RESTful Server

I have written a very simple RESTful web service using a CGI script. Now that we've gotten the theory out of the way, I'm going to walk through how this server works to help explain the concepts in practical terms. This server manages the books in my library. Information about each book is stored as a YAML file in a certain folder. I've avoided using a database to store the information in this guide because I don't want to worry about serializing and unserializing the data. I want to focus on the REST protocol itself as much as possible.

I have based the interface of this REST server upon the work being undertaken on the Jifty web application framework. I believe they have had some good ideas with respect to RESTful implementation. If you are familiar with Jifty, some of the code and policy decisions I've made will look familiar.

Sample Server URLs

Before diving into the implementation, I want to make a note of the URLs I have chosen for it. These URLs are meant to be easy to comprehend in pieces and extensible. These are very similar to the URLs chosen for the Jifty REST plugin.

First, I've chosen to make all the REST interface URLs start with "/=". This may seem a little odd on first glance. However, it provides a very simple way to set your REST URLs apart from the rest of your site. I think its a nice idiom for "IM IN YR API!"

Second, I've made the next component of the API "/model". This borrows from MVC the word "Model." The reason I do this is because one might extend this REST API to include additional features like "/action/" to execute remote procedures or "/search/" to execute a search for data, etc. Those aren't necessarily RESTful, but certainly useful.

Third, I've made the next component of the API "/book" to specify the name of the kind of data we're working with. Again, a future extension might foresee enhancements that add additional models for storage. I might store author biographies in "/author" or information about friends I've loaned books to under "/loan".

These are policy decisions that you should think about ahead of time to allow your API to be flexible with future enhancements without breaking things already made if you can avoid it. These were policies chosen by the Jifty developers for these and similar reasons.

GET to Document

I have chosen to provide some documentation within the API itself. If you install the library.cgi script into your local cgi-bin directory and go to the top-level URL in your browser, http://localhost/cgi-bin/library.cgi/= (or something like that depending on where you installed it), you will get an HTML response documenting how the interface works.

Self-documenting services are, in my opinion, a good idea. If I were building this server for production use and my project time line allowed time for it, I would want to add further documentation to the various error messages that occur to further document how to use the interface. By doing so, you can make recommendations to the developer (or the end-user that got to the wrong place) regarding how to fix the problem.

GET to List

The first real aspect of the API we'll cover is the one that is most fundamental at the get go: listing. This isn't really an aspect of CRUD we discussed above, but if you don't know what resources are available, it might be difficult to fetch them or update them.

The URL for accessing this list of resources is "/=/model/book/id" and it lists all the IDs for the book model.

The code in the sample server is pretty simple. It looks for all the available resources, which are stored as YAML files on the disk. It then outputs an HTML file containing links to the resources found:

The ID of the book (generally the ISBN as we'll see later) is in the filename, as well as stored within the file for reference. The Perl code above outputs an unordered list in HTML of links to the books in my library.

You can try this one out in your browser directly. The URL will be something like this:

http://localhost/cgi-bin/library.cgi/=/model/book/id

If you have an empty resource library (i.e., you just installed it and haven't used the client to add any books), the page will be empty. If you have one or more books stored, you will see bullets with linked IDs.

If you click on one of the links, you will access the book's YAML description.