Technology, Open Source and Identity

RESTful Authentication

My last post on RESTful transactions sure seemed to attract a lot of attention. There are a number of REST discussion topics that tend to get a lot of hand-waving by the REST community, but no real concrete answers seem to be forthcoming. I believe the most fundamental reasons for this include the fact that the existing answers are unpalatable – both to the web services world at large, and to REST purists. Once in a while when they do mention a possible solution to a tricky REST-based issue, the web services world responds violently – mostly because REST purists give answers like “just don’t do that” to questions like “How do I handle session management in a RESTful manner?”

I recently read an excellent treatise on the subject of melding RESTful web services concepts with enterprise web service needs. Benjamin Carlyle’s Sound Advice blog entry, entitled The REST Statelessness Constraint hits the mark dead center. Rather than try to persuade enterprise web service designers not to do non-RESTful things, Benjamin instead tries to convey the purposes behind REST constraints (in this case, specifically statelessness), allowing web service designers to make rational tradeoffs in REST purity for the sake of enterprise goals, functionality, and performance. Nice job Ben!

The fact is that the REST architectural style was designed with one primary goal in mind: to create web architectures that would scale well to the Internet. The Internet is large, representing literally billions of clients. To make a web service scale to a billion-client network, you have to make hard choices. For instance, http is connectionless. Connectionless protocols scale very well to large numbers of clients. Can you imagine a web server that had to manage 500,000 simultaneous long-term connections?

Server-side session data is a difficult concept to shoehorn into a RESTful architecture, and it’s the subject of this post. Lots of web services – I’d venture to say 99 percent of them – manage authentication using SSL/TLS and the HTTP “basic auth” authentication scheme. They use SSL/TLS to keep from exposing a user’s name and password over the wire, essentially in clear text. They use basic auth because it’s trivial. Even banking institutions use this mechanism because, for the most part, it’s secure. Those who try to go beyond SSL/TLS/basic auth often do so because they have special needs, such as identity federation of disparate services.

To use SSL/TLS effectively, however, these services try hard to use long-term TCP connections. HTTP 1.0 had no built-in mechanism for allowing long-term connections, but NetScape hacked in an add-on mechanism in the form of the “connection: keep-alive” header, and most web browsers support it, even today. HTTP 1.1 specifies that connections remain open by default. If an HTTP 1.1 client sends the “connection: close” header in a request then the server will close the connection after sending the response, but otherwise, the connection remains open.

This is a nice enhancement, because it allows underlying transport-level security mechanisms like SSL/TLS to optimize transport-level session management. Each new SSL/TLS connection has to be authenticated, and this process costs a few round-trips between client and server. By allowing multiple requests to occur over the same authenticated sesssion, the cost of transport-level session management is amortized over several requests.

In fact, by using SSL/TLS mutual authentication as the primary authentication mechanism, no application state need be maintained by the server at all for authentication purposes. For any given request, the server need only ask the connection layer who the client is. If the service requires SSL/TLS mutual auth, and the client has made a request, then the server knows that the client is authenticated. Authorization (resource access control) must still be handled by the service, but authorization data is not session data, it’s service data.

However, SSL/TLS mutual auth has an inherent deployment problem: key management. No matter how you slice it, authentication requires that the server know something about the client in order to authenticate that client. For SSL/TLS mutual auth, that something is a public key certificate. Somehow, each client must create a public key certificate and install it on the server. Thus, mutual auth is often reserved for the enterprise, where key management is done by IT departments for the entire company. Even then, IT departments cringe at the thought of key management issues.

User name and password schemes are simpler, because often web services will provide users a way of creating their account and setting their user name and password in the process. Credential management done. Key management can be handled in the same way, but it’s not as simple. Some web services allow users to upload their public key certificate, which is the SSL/TLS mutual-auth equivalent of setting a password. But a user has to create a public/private key pair, and then generate a public key certificate from this key pair. Java keytool makes this process as painless as possible, but it’s still far from simple. No – user name and password is by far the simpler solution.

As I mentioned above, the predominant solution today is a combination of CA-based transport-layer certificate validation for server authentication, and HTTP basic auth for client authentication. The web service obtains a public/private key pair that’s been generated by a well-known Certificate Authority (CA). This is done by generating a certificate signing request using either openssl or the Java keytool utility (or by using less mainstream tools provided by the CA). Because most popular web browsers today ship well-known CA certificates in their truststores, and because clients implicitly trust services that provide certificates signed by these well-known CA’s, people tend to feel warm and fuzzy because no warning messages pop up on the screen when they connect to one of these services. Should they fear? Given the service verification process used by CAs like Entrust and Verisign, they probably should, but that problem is very difficult to solve, so most people just live with this stop-gap solution.

On the server side, the web service needs to know the identity of the client in order to know what service resources that client should have access to. If a client requests a protected resource, the server must be able to validate that client’s right to the resource. If the client hasn’t authenticated yet, the server challenges the client for credentials using a response header and a “401 Unauthorized” response code. Using the basic auth scheme, the client base64-encodes his user name and password and returns this string in a response header. Now, base64 encoding is not encrytion, so the client is essentially passing his user name and password in what amounts to clear text. This is why SSL/TLS is used. By the time the server issues the challenge, the SSL/TLS encrypted channel is already established, so the user’s credentials are protected from even non-casual snoopers.

When the proper credentials arrive in the next attempt to request the protected resource, the server decodes the user name and password, verifies them against its user database, and either returns the requested resource, or fails the request with “401 Unauthorized” again, if the user doesn’t have the requisite rights to the requested resource.

If this was the extent of the matter, there would be nothing unRESTful about this protocol. Each subsequent request contains the user’s name and password in the Authorization header, so the server has the option of using this information on each request to ensure that only authorized users can access protected resources. No session state is managed by the server here. Session or application state is managed by the client, using a well-known protocol for passing client credentials on each request – basic auth.

But things don’t usually stop there. Web services want to provide a good session experience for the user – perhaps a shopping cart containing selected items. Servers typically implement shopping carts by keeping a session database, and associating collections of selected items with users in this database. How long should such session data be kept around? What if the user tires of shopping before she checks out, goes for coffee, and gets hit by a car? Most web services deal with such scenarios by timing out shopping carts after a fixed period – anywhere from an hour to a month. What if the session includes resource locks? For example, items in a shopping cart are sometimes made unavailable to others for selection – they’re locked. Companies like to offer good service to customers, but keeping items locked in your shopping cart for a month while you’re recovering in the hospital just isn’t good business.

REST principles dictate that keeping any sort of session data is not viable for Internet-scalable web services. One approach is to encode all session data in a cookie that’s passed back and forth between client and server. While this approach allows the server to be completely stateless with respect to the client, it has its flaws. First, even though the data is application state data, it’s still owned by the server, not the client. Most clients don’t even try to interpret this data. They just hand it back to the server on each successive request. But this data is application state data, so the client should manage it, not the server.

There’s no good answers to these questions yet. What it comes down to is that service design is a series of trade-offs. If you really need your web service to scale to billions of users, then you’d better find ways to make your architecture compliant with REST principles. If you’re only worried about servicing a few thousand users at a time, then perhaps you can relax the constraints a bit. The point is that you should understand the constraints, and then make informed design decisions.

Since cookies are specific to the HTTP protocol, they are inherently non-RESTful (from the Fielding-purist perspective) since REST is theoretically not tied to HTTP. (This usually leads to ridiculous discussions about whether REST could be used over something like FTP, which also misses the point.)

Additionally, while I understand that it’s an easy example to use, I believe that a shopping cart is not what REST prohibits as session state. It’s pretty easy to represent the shopping cart and it’s items completely within the REST model (by coincidence I just wrote an example of this at stackoverflow earlier this evening) if you think of the cart as a collection resource and the items as individual resources, both accessible through URIs. Then the client only needs to know the cart ID to access the correct series of URIs — and really not even that if you couple it to the identity via basic auth.

I may be wrong but I feel reasonably sure that the REST prohibition on session state refers to what most people think of when they hear the term — session IDs, server-based session variables, and so on.

@Mac: Hmmm. Yes, I recognize that a persistent shopping cart can also be thought of as a server-side resource that’s manipulated by (specific) client activity.

However, if you recall that the purpose for NOT managing session state on a server is for server scalability, then you must also see that if a billion users connected to your service and set up a shopping cart that lasted for a month, you’d have a resource issue to deal with – your shopping cart cache would probably out-weigh your product database by a factor of 1000.

But that’s the purist perspective. The way most folks deal with this issue is by saying, “A billion users! That’ll *never* happen!” And they’re probably right, as is shown by the number of such services out there that are successfully running with this philosophy.🙂

From a security perspective (not that I’m an expert on security, by any means), the RESTful method of SOA really doesn’t cut it, unless of course you aren’t worried about people hacking your front end, and implicitly having access to your web services. If you want to go to a bunch of work to write code to secure your services, you could.

In the second to last paragraph you mention cookies as an approach to state management. An alternative approach is the whole HATEAOS approach which to me boils down to transmitting urls in your REST responses which hold enough state that the server can indeed remain stateless but the state is actually ‘carried around’ via the request/response cycle. I thought it wa worth mentioning with respect to this post.

Very good point. It’s interesting to note that the SAML authN/Z protocol uses a hybrid of this mechanism to avoid the use of cookies entirely. State is maintained on the SAML server, but the way users are connected to server-side state is via a token passed around in the redirect URLs. Thanks for the comment.

A few other problems with the server-state approach are how to manage server farms and cart volatility. If you have an in-memory session on a specific server in a server farm, you must ensure that the connection is now sticky and won’t go to another server where the cart session does not exists.

If that specific server needs to be cycled for whatever reason, then the in-memory session is also lost.

Keeping the cart on the client side also has volatility issues. What if the user closes that client and opens up another one somewhere else. Where did the cart go?

Probably the best solution to address these issues is to store the cart in the database as Mac indicates. Storing a few primary keys will not take up much space. As well, if you need 1000 servers for every 1 billion users, so be it. A nice problem to have.

Hi I am new to restful, In my application i need to provide authentication using restful. With this authentication we need to do the restful web service security in my application. Could you please help me on this with sample code(for client and server both).

You should probably take that question to StackOverflow.com. Make sure you include what language/framework you will be using and whether or not the auth data is stored in a directory service like Active Directory. It would also be helpful to include some samples of things you will be doing with the services.