HTTP

There’s more than a little confusion and angst out there about HTTP status codes. I’ve received
more than a few e-mails (and IMs, and DMs) over the years from stressed-out developers (once at
2am, their time!) asking something like this:

Today, the IESG approved publication of “An HTTP Status Code to Report Legal Obstacles”. It’ll be an RFC after some work by the RFC Editor and a few more process bits, but effectively you can start using it now.

One of the things that came up at the HTTP Workshop was “distributed HTTP” — i.e., moving the Web from a client/server model to a more distributed one. This week, Brewster Khale (of Archive.org fame) talked about similar thoughts on his blog and at CCC. If you haven’t seen that yet, I’d highly suggest watching the latter.

The IESG has formally approved the HTTP/2 and HPACK specifications, and they’re on their way to the RFC Editor, where they’ll soon be assigned RFC numbers, go through some editorial processes, and be published.

A few months ago I went to the Internet Governance Forum, looking to understand more about the IGF and its attendees. One of the things I learned there was a different definition of “intermediary” — one that I think the standards community should pay close attention to.

Chrome is looking at adding support for RFC5861’s stale-while-revalidate, which is really cool. I wrote about the details of SwR when it first became an RFC, but its application to browsers is something that’s a new. Seems like a good time to answer a few potential questions.

When TLS was defined, it didn’t allow more than one hostname to be available on a single IP address / port pair, leading to “virtual hosting” issues; each Web site (for example) now requires a dedicated IP address.

The IETF now considers “pervasive monitoring” to be an attack. As Snowden points out, one of the more effective ways to combat it is to use encryption everywhere you can, and “opportunistic encryption” keeps on coming up as one way to help that.

Recently, one of the hottest topics in the Internet protocol community has been whether the newest version of the Web’s protocol, HTTP/2, will require, encourage or indeed say anything about the use of encryption in response to the pervasive monitoring attacks revealed to the world by Edward Snowden.

There’s been a lot of interest in and effort expended upon “hypermedia APIs” recently. However, I see a fair amount of resistance to it from developers and ops folks, because the pragmatic benefits aren’t often clear. This is as it should be, IMO; if you’re not able to describe concrete benefits without hand-waving about the “massive scale of the Web.”

One of the major mechanisms proposed by SPDY for use in HTTP/2.0 is header compression. This is motivated by a number of things, but heavy in the mix is the combination of having more and more requests in a page, and the increasing use of mobile, where every packet is, well, precious. Compressing headers (separately from message bodies) both reduces the overhead of additional requests and of introducing new headers. To illustrate this, Patrick put together a synthetic test that showed that a set of 83 requests for assets on a page (very common these days) could be compressed down to just one round trip – a huge win (especially for mobile). You can also see the potential wins in the illustration that I used in my Velocity Europe talk.

A common problem for APIs is partial update; when the client wants to change just one part of a resource’s state. For example, imagine that you’ve got a JSON representation of your widget resource that looks like:

One thing I didn’t cover in my previous rant on HTTP API versioning is an anti-pattern that I’m seeing a disturbing number of APIs adopt; using a HTTP header to indicate the overall version of the API in use. Examples include CIMI, CDMI, GData and I’m sure many more.

In discussing my whinge about AppCache offline with a few browser vendory folks, I ending up writing down my longstanding wishlist for making browser caches better. Without further ado, a bunch of blue-sky ideas;

HTML5’s AppCache mechanism is one confused little puppy. Purporting to be for taking web applications offline — a compelling and useful thing — it’s more often used by performance-hungry sites that want to use it as an online cache.

Since SPDY has surfaced, one of the oft-repeated topics has been its use of TLS; namely that the SPDY guys have said that they’ll require all traffic to go over it. Mike Belshe dives into all of the details in a new blog entry, but his summary is simple: “users want it.”

One of the nagging theoretical problems in the Web architecture has been finding so-called “site-wide metadata”; i.e., finding something out about a Web site before you access it. We wrestled with this in P3P way back when, and the TAG took it up after that.

Resource Packages is an interesting proposal from Mozilla folks for binding together bunches of related data (e.g., CSS files, JavaScript and images) and sending it in one HTTP response, rather than many, as browsers typically do.

If you haven’t seen it already, check out the Call for Papers for the First International Workshop on RESTful Design (WS-REST 2010), where I’m on the program committee, along with many of the usual suspects.

A long time ago*, the word in high-performance proxy-caching was Inktomi’s Traffic Server. It was so fast it was referred to being “carrier grade” and this could be said without people smirking, and it was deployed by the likes of AOL, when AOL was still how most people accessed the Internet.

I had a lovely holiday weekend in Canberra with the family, without Web access. Perhaps I’ll blog about that soon — Canberra being in my opinion one of the nicest overlooked cities in the world — but that will have to wait. Going offline for a few days always brings a certain dread of what one’s inbox will hold when you get back, and this one was no exception.

Some folks at work were having problems debugging HTTP with LWP ’s command-line GET utility; it turned out that it was inserting Link headers — HTTP headers, mind you — for each HTML <link> element present.

Having complained before about the sad state of HTTP APIs, I’m somewhat happy to say that people seem to be getting it, producing more capable server-side and client-side tools for exposing the full range of the protocol; some frameworks are even starting to align object models with resource models, where HTTP methods map to method calls on things with identity. Good stuff.

The stale-while-revalidate and stale-if-error extensions aren’t the only fiddling we’ve been doing with the HTTP caching model. Now that Squid 2.7 is starting to see daylight, I can explain about a much more ambitious project — Cache Channels.

I’ve been hoping to avoid this, but ETags seem to be popping up more and more often recently. For whatever reason, people latch onto them as a litmus test for RESTfulness, as the defining factor of HTTP’s caching model, and much more.

A while back I wrote up the state of browser caching, after writing a quick-and-dirty XHR-based test page, with the idea that if people know how their content is handled by common implementations, they’d be able to trust caches a bit more.

The QCon presentation ( slides) was ostensibly about how we use HTTP for services within Yahoo’s Media Group. When I started thinking about the talk, however, I quickly concluded that everyone’s heard enough about the high-level benefits of HTTP and not nearly enough details of what it does on the ground. So, I decided to concentrate on one aspect of the value that we get from using HTTP for services; intermediation, as an example.

I’ve published a revision of the Caching Tutorial for Web Authors and Webmasters, the first non-trivial edit in some time almost since I wrote it in 1998. That said, there aren’t any substantial changes; this is mostly tweaking and incorporation of new information.