Archive for category RESTful Design

As part of an ongoing effort to modernize the look and feel of the CSE web, we have begun to re-implement many of our internal tools in a more “Web 2.0-y” manner. The front-end tool we’ve chosen is ExtJS, a javascript framework that lets you create beautiful AJAX forms that communicate with your datastore using webservices.

The link between ExtJS and your backend data is handled via ExtJS Stores, Readers, and Writers. Stores are configured with the URLs of your webservice. One of the more interesting configuration options for these stores is restful, a boolean which tells the proxy whether to behave in a RESTful manner or using its own POST based CRUD operations. The default for restful is false.

The “Times Away” tool is an internal utility that CSE uses to allow faculty and staff to enter planned absences and which permits the display of a weekly calendar showing people who will be away from the department. We have using RESTful webservices for internal data exchange for some time, but when I recently re-implemented the Times Away tool, and I chose to write the backend in the non-restful way in order to see what insight I might gain.

Mechanics

The default data exchange scheme for ExtJS is quite simple. All interactions with the backend use POST (though this can be reconfigured to GET). Each POST contains an action parameter which can be any of the four well-known create, read, update, or destroy operations. Other POST parameters can be supplied as needed, enabling us to do typical things, like read all records for a given person.

ExtJS supplies subclasses for Json and XML readers and writers, both reasonable formats that are easily handled when writing your back end. Data from a webservice can be bound to forms and grids, and can also be read directly from the store should you desire such access. You need to make a few concessions in the format of your data (more on this below) but none of them is unreasonable.

Advantages of the Plain CRUD Webservice

Data are more tightly coupled to a traditional database architecture. You’re free to convert this to a disadvantage if you like. But in my mind, this is an application fed from a rectangular database table, and thinking in CRUD terms still feels the most natural.

Querying data is more natural. Asking a RESTful webservice for a subset of a resource is often accomplished using a query string. This can seem awkward and a violation of the purity of the RESTful esthetic. Knowing that all times away for boren might be at:

http://my.server.com/times_away/boren

is well and good, but finding his vacation days for the month of April might look like:

http://my.server.com/times_away/boren?type=vacation&month=4

On the other hand, supplying parameters specifying the desired user, type, and month, along with an action=read, seems perfectly self-consitent and in line with the way programmers commonly think.

Side-steps the POST/PUT problem. It has always bothered me that PUT is not widely supported by our HTTP infrastructure, leading to the need to overload the POST operation to support it.

Disdvantages of the Plain CRUD Webservice

Resources are not really addressable. For this limited application, this is not really a disadvantage, but it’s not going to scale well either. Web caches and search engines will respond well to http://my.server.com/times_away/boren, but not to http://my.server.com/times_away?user=boren.

Resources not easily browsable. One of the best parts of developing a
RESTful webservice (or developing against one) is simply being able to type its address into your browser and have a look at it. With the CRUD webservice you’ll need to either write yourself a simple form or use some other tool.

Data format must be customized. For most imaginable uses, your payload needs to be formatted so that error conditions are communicated to the client. For an ExtJS application, reading boren’s vacation might return:

{
success: true,
rows: [...]
}

or

{
success: false,
message: "User not found"
}

While this format can easily be re-used in other ExtJS applications, it is not
as universally portable and re-usable as a purely RESTful payload would be.

Error handling is fragmented. For the error condition, there is a discontinuity between an HTTP 404 meaning the service itself could not be found, and a returned success=false along with an application specific error code or description explaining the problem.

Conclusions

For an application of this limited scope, the CRUD webservice approach is fine. It offers many of the same advantages of a RESTful webservice: it uses HTTP for transport, decouples read/write operations from your datastore, and handles data in an easily human-readable format. Writing this webservice to be RESTful would have been a tiny bit more work on the back-end but not enough to matter. And the advantages of scalability, use of widely-accepted standards for communicating success and error conditions, and the ease of universal re-use outway the small extra effort.

Although I started this project with a fairly open mind, I half expected to
prefer the CRUD/POST by the end of the project, believing that it would be simpler to implement and understand. Instead, I ended up reinforcing many of the advantages of REST in my own mind. The next webservice I write will definitely be RESTful.

The ROA Tech team recently had a discussion about the form a payload might take if you were authorized to see only part of the data requested. Imagine the following hypothetical resource and its XML representation:

For example, suppose that a receptionist requested such a record and that he was authorized to see name and phone number, but not the paygrade or sick leave balance. What should be returned? There are several obivous ways of handling this situation.

One approach might be thought of as a “variable representation” solution. The requester only gets back fields he is allowed to see. Our receptionist would get back this simplified payload, with no mention of the forbidden data:

<employee>
<id/>
<name/>
<phone/>
</employee>

If we chose instead always to return a fixed-form representation, several possibilities arise. We might implement access to the resource by separate public and private URIs, returning only the appropriate fields:

/employee/public/{id}
/employee/private/{id}

Alternatively, we could stick with the single URI and an isomorphic payload, but fill the disallowed fields either with nulls or some sort of error value to indicate that access is not allowed (a sort of per-field 403).

Is one of these solutions better than the others? Each solution has its own strengths, and no consensus emerged during our discussion.

The “variable representation” solution is the one found in our own person webservice– you only get back what you’re allowed to see. This keeps the security close to the datasource. But it can also be hard to code against if the client needed to be smart enough to know that some fields might be invisible to some users.

The private/public approach is straightforward in its expectations, but cumbersome in its own way. Now, the client might need to be smart enough to know which resource to request. And is the public version always a subset of the public version? A no answer has the possibility to complicate parsing of the payloads.

Isomorphic payloads are probably the easiest for the client to handle because the client can understand that payload without need of external knowledge of the resource’s structure or authorization model. If all fields in the resource are to be returned (and ease of client use is the goal), it is preferable to include an error value for disallowed fields in order to disambiguate between data that are truly empty compared to those that are forbidden.

One final question troubles me about this last solution, though: are there cases where merely acknowledging the existence of a disallowed field represents a security concern? Is it potentially worse to disclose that a datum exists but can’t be viewed than it is to omit it silently? I have not been able to come up with an example of such a field, nor have I come up with a convincing argument that
it should not be a concern. Anybody want to chime in?

Recently, in the ROA technical group, we discussed what constitutes a version change in our RESTful services. Many of us have adopted the versioning convention of embedding a version integer in the URI as supported in the Richardson and Ruby book

/student/v3/course

So when do we need to change this integer? As we add new data elements to our fledgling services, we are concerned about a proliferation of versions, and in particular, URI’s. Do we need to change URI’s with the addition of each new element? To do this too often seems to add unnecessary complexity and maintenance cost. Instead we have decided that minor changes that are “additive” do not necessitate an integer version change and consequently do not require a change to the URI.

By “additive” changes, we mean changes that wouldn’t break xpath queries based upon the class attribute:

/html/body/div/span[@class='foo']

as opposed to a queries based upon the ordinal relationship of elements

/html/body/div[2]/span[3]

(We consider the use of queries based on the class attribute a better development practice.)

One concern is that even if we don’t break clients with a version change, how do we communicate the change. We are recommending the inclusion of a programmatically parsable xhmtl version element in the root of each service that describes a fuller version, perhaps with major, minor, revision, and build.

Work continues on the Financial Web Service targeting “summer” for release. There will be six resources that comprise v1: budget, budget search, organization, organization search, vendor and vendor search; inquiry only. A data dictionary will be provided within the resource representations. The service will be publicly available. A couple of clients have already signed up to start using the service. If you are interested in using the service or reviewing the proposed representation attributes, let us know!

Since this blog launched last week I received a couple questions asking us “What is ROA?”. Since this blog is named “On the ROA” it seemed necessary to answer that question!

So what is ROA?

“Resources Oriented Architecture is a way of turning a problem into a Restful web service: an arrangement of URIs, HTTP, and XML that works like the rest of the Web, and that programmers will enjoy using.” – Restful Web Services (Leonard Richardson, Sam Ruby)

Our answer is that Resources Oriented Architecture (ROA) is an overall system design that embraces RESTful design philosophy. It has a lot of overlap with Services Oriented Architecture (decentralization and small interoperating services) but it means that instead of treating our functionality and data as service calls; we treat them as resources in the RESTful sense.

In the early days of grappling with web services here at the UW many of us working in various IT departments investigated SOA (Service Oriented Architecture). During this investigation we were introduced to Pete Lacey, who introduced us to Restful web services. This led to a SOA workshop which many developers and technologists on campus attended. The idea of building Restful web services vs building SOAP web services on campus seemed to take hold probably because of the heterogeneous nature of computing here at the UW. The need to be technology agnostic in our approach to deliver data and automate processes seems to make a lot of sense in order to ensure all parts of the UW can take part in leveraging information services. In addition, the REST (Representational State Transfer) approach to constructing web services allows a very simple interface between systems (services and clients), and it scales well.

Following the Restful path, we soon discovered that there was a term that was being used to describe an architecture based on Restful design practices: Resource Oriented Architecture.

With that said, the bottom line is that the UW has a lot of information that people and applications need to access in order to keep the UW well positioned for the future. That information needs to have a low barrier for access and at the same time be safely accessible. We believe that planning our data access needs for the future using ROA and building web services based on a Restful design is one way that can help us meet this need.

Since we’re following RESTful principles here, we already have a tool to communicate these errors: the HTTP status code. Now – this is a pretty good system as far as the transport protocol goes, and the formally defined codes available cover lots of the generic cases such as “I just don’t know what you’re asking for” and “I can’t let you see that.”

But I’m pretty sure you see the problem. My service for letting instructors grade a student’s homework may choke on a request that tries to set the grade to ’4.2′ and there’s no HTTP status code for “Grades don’t go that high.” Some people have tried to extend or build on top of HTTP to cover some important domains but you can’t cover everything (or get everyone to use your extension).

Luckily, HTTP accounts for this with some general status codes such as “400 – The request could not be understood.” HTTP recommends that you provide not only a status code of 400 but a document body that describes the error. When a client sees a 400, it can dive into that payload to get more information.

Similarities Emerge

The ROA Technology group recently looked at the error documents that a variety of services produce, and found that the approaches were quite similar. Based on examples from the Student Web Service, Catalyst, and Amazon’s Web Services offerings, we would like to offer the following recommendation.

Recommendation

The format should be easily parseable. We recommend XHTML because of the huge number of tools that know how to parse and display it, but many other formats would do just as well.

Three distinct elements for three distinct audiences. Three types of viewers will probably see your error messages: HTTP clients that are actually doing the transport, programmatic client code that understand the content or purpose of the service, and humans who are programming to, debuging, or using your service. So:

error_http_code Include the numeric HTTP status code used to report this error. Anything that knows how to react to generic HTTP codes will thank you.

error_key A short, unique string that identifies the error. This should be used to allow client code to programmatically react to errors in at a more granular level than the HTTP status code. These should remain as static as possible to avoid breaking code!

error_description A human-readable chunk of text or hypertext intended to help a user of the service identify and resolve the error.

These documents could be extended with new elements. Your service might have good reason to include other useful bits of information, and we encourage this as appropriate.

Great! Now, my HTTP library can avoid caching this since it’s an error, my client code can notice “InvalidHomeworkScore” and tell my end user that they need to try again, and when I was programming my client, it was easy for me to figure out what the heck was going wrong.