CDN Caching Caution

Cash may be king in the real world, but cache is king in the world of the internet. From DNS record TTLs, to HTTP Max-age headers, caching at different levels makes the internet faster and our lives easier. One technology which has contributed extensively over the years in improving web performance is the Content Delivery Network or CDN. Caching and serving the content from servers closer to the end-user has been absolutely revolutionary and has drastically brought down the overall response time to load web content.

CDNs have used different caching techniques in a very effective way, but at times because of divergent behavior of these methods (especially HTTP headers), some notable attention has to be given while configuring with CDNs, which otherwise might be difficult to leverage its maximum benefits.

Let’s talk about some of these important headers, which at times can make a CDN’s life a bit arduous.

Vary header

HTTP/1.1 200 OK

Content-Length: 24183

Cache-Control: max-age=86400

Content-Encoding: gzip

Content-Type: text/html; charset=UTF-8

Vary: Accept-Encoding

In simple words, this header instructs the browser and/or proxy to cache it cautiously as the content will vary depending on compression method. Let’s say a request comes in with Accept Encoding: gzip, and the server has content in both compressed (gzip) and uncompressed formats. The server will then send the content in compressed format, but with a flag saying to serve only those requests which contain Accept Encoding: gzip. Now, this can cause some issues when CDN is involved.

Consider this and try to analyze from a CDN’s perspective – the request goes to the origin with Accept Encoding: gzip, and origin sends the compressed content with the Vary header as above. According to this Vary header, a CDN can cache this content, but cannot serve the requests with no accept encoding header or Accept Encoding: deflate. In those cases, when the request comes again, the CDN has to go the origin and cache separately one uncompressed, and the other with the encoding: deflate. This will still work fine, and that’s why most of the CDNs support this value in Vary header; because the content will be the same, only the compression will be different, which can be handled at their edge servers.

Now, consider the example where a server has 20 copies of a particular content based on 20 different languages, this can become both a scalability and storage issue for CDNs. Hence, most of the CDNs don’t cache content if there is Vary header in the response from the Origin, apart from Accept Encoding. Therefore, inappropriate ways of handling this header can cause cache misses and skew performance results.

Age header

HTTP/1.1 200 OK

Age: 12567

Content-Length: 24183

Cache-Control: max-age=86400

Content-Encoding: gzip

Content-Type: text/html; charset=UTF-8

Age header was introduced in HTTP /1.1 to check the freshness of a particular content cached at browser or proxy. Age value indicates how old the content is after getting generated at the origin, considering all the time it spent in the proxy cache and time it spent in the network. So, basically, if the content has age of say, 12567 seconds (as shown above) then the effective remaining TTL at the browser becomes (Max age – Age) i.e. 86400-12567 = 73833 seconds.

Ideally, an Age header value of 0 says content is freshly originated at the Origin. Now, static content can be cached for long intervals with high TTL values, which allow CDNs to offload the origin and improve performance significantly. In cases where origin sends a high Age header value, it affects the caching at CDN edge servers and can cause cache misses, since the effective TTL decreases. So, both origin and CDNs have to be careful in handling the Age header.

Now, let’s have a look at another header which does impact performance, but not particularly through caching.

And we have a few headers which need noteworthy attention, VIA being one of them.

VIA header

Via: 1.0 fred, 1.1 example.com

Via is another header which needs to be taken care of especially with regard to CDNs.

If an origin receives a Via header in the request with a value like shown above, it means that while en route from the browser to the origin, the request has hopped onto this Proxy. One of the problems in this is the origin then does not send compressed content (even if it supports compression) as they are not sure if the proxy supports the corresponding compression technique. Hence, utilization of extra bandwidth to transport the uncompressed content can become a hassle.

We all know the CDN sits between a client and the server as a proxy, and the origin behaves similarly when a request comes from the CDN; which, in turn, sends uncompressed content which affects performance, as it will always take more time to transport a large sized object than a smaller one. Again, proper care should be taken at the CDN level to handle this header carefully.

Considering the importance and their impact on the overall performance of the web pages, it is crucial to oversee these headers while configuring with CDNs. Stay tuned for the next article where I will be following up on how we can achieve this through synthetic monitoring which will help both CDNs and the customer origin to review the performance.