Two middleware check ETags for unmodified responses: CommonMiddleware and ConditionalGetMiddleware and they do it inconsistently.

If the response's ETag matches the request's If-None-Match:

ConditionalGetMiddleware changes the response code to 304, preserving all headers; the content gets removed later on

CommonMiddleware creates a new HttpResponseNotModified without content and simply restores the cookies.

As a consequence, CommonMiddleware returns a response without ETag, which is wrong. I detected this with RedBot on a Django site I run. Any site with USE_ETAGS = True has this problem.

In general, wiping headers sounds like a bad idea. A 304 is supposed to have the same headers as the 200. (Well, the RFC is more complicated, but I think it's the general idea. ​Future versions of HTTP will likely require the Content-Length not to be 0.)

I believe that CommonMiddleware should simply generate the ETag and not handle conditional content removal; that's the job of ConditionalGetMiddleware.

For example, if one is using GzipMiddleware, the correct response chain is:

CommonMiddleware computes the ETag,

GzipMiddleware compresses the content and modifies the ETag,

ConditionalGetMiddleware uses the modified ETag to decide if the response was modified or not.

This is a good reason to keep "ETag generation" and "Etag checking" concerns separate. The same argument applies to any middleware that sets or modifies ETags.

Unfortunately, CommonMiddleware is documented to "take care of sending Not Modified responses, if appropriate", so this would be a backwards incompatible change.

This still leaves a dependency on the order of middleware for correct operation. I don't think Etags should be set or checked in middleware at all. We already have a USE_ETAGS setting. Why not just return the HttpResponseNotModified after all middleware has executed, when USE_ETAGS is True?

We don't need middleware to create Etags, either. We could hook into content and header assignment for HttpResponse, so that Etags will always be set or updated whenever content or headers change, when USE_ETAGS is True.

Another benefit would be that we remove the repetitive requirement for ALL middleware that alters content or headers having to check the USE_ETAGS setting and then conditionally recalculate and update the Etag header. It's currently very easy for a middleware author to alter content without realising that they must do this, which means we could be serving responses with stale Etag headers.

Was there any comment from those who prefer option 1 about all middleware that is ever written in the future having to check the value of the USE_ETAGS setting if the middleware alters content/headers, and recalculate the etag? This seems very likely to not happen in a lot of cases. Not all middleware authors will know or care about etags, and if they write a middleware class that alters content/headers it will become buggy when deployed by other developers who have enabled this setting.

I think that option 1 is just patching the symptom of coupled middleware and repetitive etag handling in Django's own middleware, but doesn't do much for the wider middleware ecosystem or preventing developers from making the same mistakes in their own code.

there, based on the original code, a new response is generated as not-modified. Only cookies are kept if they are there.

I just started working on a new ticket that will deprecate / remove {{USE_ETAGS}} setting, and get the conditional-get handling out of {{CommonMiddleware}}. Then the {{ConditionalGetMiddleware}} will generate a {{ETag}} if there is none. By this middleware just being the last one, many of the edge-cases will be fixed.

Is there any knowledge on if we should keep all the headers in a 304 response? Before the refactor we had places that handled it differently.

At the moment there are several interrelated problems with the way Django handles ETags. Briefly:

As stated in the original report above, we're not returning the correct headers with our 304s.

Gzipped responses are not getting the benefits of 304s since their ETags are not compared properly. See tickets #16035 and #26771.

Our processing of conditional requests does not follow the specification's rewritten ​precedence rules from 2014.

The condition(), vary*(), cache_control(), and gzip_page() decorators work properly in some undocumented orders but not in others.

Here are the changes I would suggest to address these problems:

Move ETag handling out of CommonMiddleware and into ConditionalGetMiddleware. This change has already been accepted (#26447) and partially implemented (​PR 6393). (Though I'm less confident about the deprecation of USE_ETAGS in that ticket.)

Change GZipMiddleware to remove the ;gzip token from the incoming ETags in process_request(). That will allow comparisons to work properly. Document this.

Change our use of GzipFile to specify a modification time of 0. That will make our gzip output dependent only on the response body, which means that we can usefully compare ETags on gzipped content, which means that we don't have to care about the order of the gzip_page() decorator.

Document that the condition() decorator should be below vary() and cache_control() so that those headers can be set properly.

In more detail:

mrmachine's comment raises a fair point. The fundamental issue here is that Django's architecture strives to be layered and decoupled, but ETags are by definition highly coupled to the response, so it's difficult to isolate them to one layer (whether that be middleware or a decorator).

That said, ​the specification only requires that the ETag change when the response body changes, and I don't think that's a very common middleware behavior. Only GZipMiddleware meets that definition among the middleware in core, for example. And if the body is changed, the ETag can be changed in a manner similar to what I'm suggesting for GZipMiddleware. I do think this needs to be documented, though.

The server generating a 304 response MUST generate any of the following header fields that would have been sent in a 200 (OK) response to the same request: Cache-Control, Content-Location, Date, ETag, Expires, and Vary. Since the goal of a 304 response is to minimize information transfer when the recipient already has one or more cached representations, a sender SHOULD NOT generate representation metadata other than the above listed fields unless said metadata exists for the purpose of guiding cache updates (e.g., Last-Modified might be useful if the response does not have an ETag field).

So I think we should return those headers from the response, along with the cookies (since that is what Django already does, and since it's ​apparently very common despite not being mandated by the standard).

I'm able to work on this, but the next step depends on the fate of ​PR 6393 , since that accomplishes some of the reorganization mentioned above.

Another solution to the gzip problem (that is, that we never match a gzipped ETag because we modify it on the way out but not on the way in) would be to have GZipMiddleware change the ETag to a weak ETag. That would be simple, consistent with the specification, and well-precedented:

I'm not aware of any downside, and the advantage is that ConditionalGetMiddleware will work (that is, produce 304 Not Modified responses) on gzipped content (e.g. if the order of the middlewares is reversed, or if the gzip_page() view decorator is used).

This usage is allowed by the specification ("MTIME = 0 means no time stamp is available", Section 2.3.1 of ​RFC 1952) and is in common use (for example, Java's GZipOutputStream sets the MTIME to 0).