(8) Wed Mar 17 2010 21:24Alternate ETag Validation Functions:
Yes, months after driving away everyone who read this weblog hoping I would talk about RESTful topics, here's some REST stuff. This is an idea I got from my co-worker Björn Tillenius. I hope someone else has come up with the same idea and given it a better name.

Here's the problem, on a high level of abstraction. Consider a representation (#1):

<p id="1">Forklift</p>
<p class="read-only" id="2">Green</p>

And let's say the ETag of this representation is the string "x".

According to the protocol governing this media type, you can modify the text in any paragraph unless its class is "read-only". So maybe you can PUT a document like this (#2):

<p id="1">Hovercraft</p>
<p class="read-only" id="2">Green</p>

Or PATCH a document like this (#3):

<p id="1">Hovercraft</p>

OK, that's easy. Now suppose that the read-only text changes randomly according to conditions on the server. Let's say the read-only text suddenly changes from "Green" to "Red". If I were to GET the document again, I'd get this document (#4):

<p id="1">Forklift</p>
<p class="read-only" id="2">Red</p>

And let's say the ETag of this document is "y". If I sent a conditional GET with an If-None-Match of "x", I'd get 200 and a new representation instead of 304 ("Not Modified").

OK, but I don't send a conditional GET. I don't get the document again at all. Instead, I PUT document #2, with an If-Match of "x", and the request fails with 412 ("Precondition Failed"). Maybe it should fail anyway; maybe the server is very strict and thinks I'm trying to change a read-only paragraph from "Red" to "Green", which would probably be 400 ("Bad Request"). But we don't even get to that point because the ETags don't match.

The request also fails with 412 if I PATCH document #3 with an If-Match of "x". But there's nothing really wrong with that request. The point of If-Match in conditional writes is to avoid conflicts with other clients, and there are no other clients here.
The ETag is different because a read-only paragraph changed on the server side.

One obvious solution is to calculate the ETag only from the read-write portion of the document. This fixes conditional writes, but it breaks conditional reads. A client that requests document 1 and then makes conditional requests will never get document 4. The ETag is no longer a strong validator (update: actually, it's not any kind of validator); the document might change significantly without the ETag changing. So that's no good.

The solution Björn came up with is to split the ETag into two parts. The first part is derived from the read-only portions of the document, and the second part is derived from the read-write portions. The ETag is a totally opaque string to the client, but the server knows what it means. On a conditional read, the server checks the entire ETag. On a conditional write, the server only checks the second half.

In this example, the ETag for document #1 might be "1.a" and the ETag for document #4 might be "2.a". A conditional GET of document #4 with If-None-Match="1.a" would fail, but a conditional write with If-Match="1.a" would succeed. When the write went through, the document's ETag would change to "2.b", and "1.a" would not be good for either conditional reads or writes.

From the client's perspective everything just works: your conditional read returns 200 iff the representation has changed, and your conditional write returns 412 iff someone else is messing with the resource. But is this okay from a standards perspective? Section 13.3.3 of RFC 2616 says "The only function that the HTTP/1.1 protocol defines on [ETags] is comparison." That doesn't seem to prohibit me from defining another one.

If "x" is a strong validator then so is "1.a", but the new comparison function ignores some of its information about the resource state, effectively treating it as a weak validator (update: or as something that's not a validator at all). Is that okay? Would you believe the following definition of a strong validation function? "In order to be considered equal, the second halves of both validators MUST be identical in every way, and both MUST NOT be weak." (cf 13.3.3 again)

I'm interested in your thoughts on this. Smartass comments like "you should have two resources" will not be dismissed out-of-hand but also will probably not convince me. If you're curious, here's the real-life bug that spawned this thinking.

Hi Leonard - my first reaction is that this approach is treating strong validators as weak. Since the differences between representations in the example are "semantically significant", does it make sense to use weak validators?

Secondly, the downside in generalizing this approach is that the client may not want to do the PUT if it knows that the color is "red". A 412 gives the client a chance to decide whether it should proceed or not. The client would lose this ability if the server allows the PUT on the stale validator.

The PUT request will still fail if the color has changed, it'll just fail with a 400 instead of a 412. The second paragraph in the PUT becomes an assertion that the color hasn't changed. And if you don't care about the color, you can use PATCH. (If you don't buy this, then suppose the server validates the full ETag on PUT but only the second half of it on PATCH.)

"1.a" is a strong validator no matter what. The server always serves strong validators and the client always sends strong validators. But I don't know whether my new validation function is believable.

And the "a" part of "1.a" isn't even be a weak validator, since the meaning of the entity can change without the "a" part changing. (If the color change wasn't semantically significant it wouldn't matter if a client never got document #4.)

I don't know. If this was easy I wouldn't have to write a weblog post about it.

A conditional GET of document #4 with If-None-Match="1.a" would fail, but a conditional write with If-Match="1.a" would succeed.

I think I know what you mean in the first half of that sentence by fail, as in fail to generate a 304 response, but this is infelicitous as IIUC it would cause a 200 response in that case, which many people would not consider a failure.

Am I missing something?

I was going to say that, assuming the second half of that sentence, you have a problem with the client assuming "Green" when "Red" is what the server has, but then your answer to Subbu just confuses me. The PUT request will still fail if the color has changed, it'll just fail with a 400 instead of a 412. seems to contradict a conditional write with If-Match="1.a" would succeed. ISTM that the binary 2xx/412 is the whole point of this exercise, while other 4xx's are excluded from this analysis. Only a psychic client would follow a 400 response with a GET of that resource, whereas 412 is a pretty clear direction to do exactly that. OTOH in many cases only the client knows if it cares whether the color is "Red" or "Green", so the server would have to be psychic to know how to respond. In the interests of salvaging something from all of this, could the server respond with 205 to accept the PUT but let the client know it might want to follow up with another GET?

"A conditional GET of document #4 with If-None-Match="1.a" would fail" -- I meant that the condition would fail. To avoid confusion I'll use status codes below instead of saying "succeed" or "fail".

I didn't explain the PUT example well. My main failing was that I described two possible ways it could work. If you think an unconditional PUT that tries to change a read-only field should 400, then a conditional PUT that tries to change a read-only field should 412.

If you think an unconditional PUT that tries to change a read-only field should 20x (with the change to the read-only field being ignored), then a conditional PUT that tries to change a read-only field should also 20x.

A common rhetorical technique (which I've used myself) is to say "if you were more hard-ass about the literal meaning of RFC2616 this problem would be moot". That works for PUT, if you want to go that way. But it doesn't work for PATCH, where a conditional request can 412 even though it doesn't take a psychic to see there's no real conflict.

Oh. In my previous comment, I had not understood the PUT as attempting to update the read-only portion. I thought our client was just changing "Forklift" -> "Hovercraft", without the client realizing that something on the server side had changed "Green" -> "Red". [Incidentally, I had also assumed conditional rather than unconditional PUTs, probably because I've been looking at CouchDB quite a bit recently.]

If you think an unconditional PUT that tries to change a read-only field should 400, then a conditional PUT that tries to change a read-only field should 412.

If 'read-only' has any meaning, both of these should 400.

I think we're straying a bit from rfc2616. If a client didn't know anything about resources with read-only portions, it would expect both:

* a 2xx response to a conditional PUT indicates that it knows exactly (for some provisional meaning of exactly, which really must exclude "Green"="Red") what representation corresponds to the ETag it holds. Something might have changed since then, but the client knows the state of the resource at the time of the PUT. If the client had wanted to ignore the resource state entirely it would have just sent an unconditional PUT.

* a 412 response to a conditional PUT indicates that it should GET and then optionally re-PUT.

ISTM 1) neither of these scenarios holds under in a world of partially read-write/partially read-only resources, 2) in such a world these scenarios are not replaced by equally-clear alternative scenarios, and 3) even if they were it's problematic to expect any client to know about those alternatives. Your anticipation of "two resources" advice is correct, in that 'read-only' is then implicitly defined by a 403 or even a 405 response.

PUT was just my way of explaining the problem without introducing new concepts for people who don't know PATCH. I agree that the problem doesn't come up if you take a strict view of PUT ("you can't do that"). Instead you have other problems ("you can't do that"), and PATCH is useful for solving those other problems.