Is the Vary: User-Agent HTTP Header Broken?

Mobile SEO practices got a little easier last summer when Google’s Pierre Farr announced definitive guidelines for mobile SEO at SMX West in Seattle. The outline was easy to follow, allowed for multiple scenarios and explained the search engine’s blue-sky proposal with Responsive Design.

Like most SEO’s, we took this announcement, incorporated the content into our best practices, and proceeded to implement with clients.

According to those recommendations, the Vary HTTP header has two important and useful implications:

It signals to caching servers used in ISPs and elsewhere that they should consider the user-agent when deciding whether to serve the page from cache or not. Without the Vary HTTP header, a cache may mistakenly serve mobile users the cache of the desktop HTML page, or vice versa.

It helps Googlebot discover your mobile-optimized content faster, as a valid Vary HTTP header is one of the signals we [Google] may use to crawl URLs that serve mobile-optimized content.

Tech teams often ask us why they should implement the header, which is fairly easy to do, and the explanations above are very well received. That was until one client implemented the header and saw a very large hit to its web server’s traffic and resources. To our knowledge, this was not supposed to happen, so it was evident we needed to investigate the issue further.

It’s worth noting that the client uses Akamai as their CDN. This is not uncommon – we have many clients who leverage this geographically dispersed platform to off-load resources and decrease load times.

When our client talked with Akamai, they learned that the massive traffic increase to their website was due to the implemented Vary header.

When the upstream providers (in this case, Akamai) aren’t able to cache, they have to keep asking the web server for documents and assets. As a result, the CDN sent more traffic directly to the client’s web servers.

This is explained further in Akamai’s documentation:

The HTTP Vary header is used by servers to indicate that the object being served will vary (the content will be different) based on some attribute of the incoming request, such as the requesting client’s specified user-agent or language. The Akamai servers cannot cache different versions of the content based on the values of the Vary header. As a result, objects received with a Vary header that contains any value(s) other than “Accept-Encoding” will not be cached. To do so might result in some users receiving the incorrect version of the content (wrong language, etc.)

“Vary: User-Agent is broken for the Internet in general. …the basic problem is that the user-agents vary so wildly that they are almost unique for every individual (not quite that bad but IE made it a mess by including the version numbers of .Net that are installed on users machines as part of the string). If you Vary on User-Agent then intermediate caches will pretty much end up never caching resources (like Akamai).”

Meenan’s explanation complicates this issue further for clients that use CDN’s with the Google Mobile recommendation. Ultimately, the header status will stop being cached entirely because of all the user-agent string varieties. Additionally, IE exacerbates the problem by including the .Net version in the user-agent string installed on the requesting computer.

“Many HTTP caches decide that Vary: User-Agent is effectively Vary: * since the number of user-agents in the wild is so large. By asking to Vary on User-Agent you are asking your CDN to store many copies of your resource which is not very efficient for them, hence their turning off caching in this case.”

For now, we’re hunting for a work-around solution to this problem, with help from Google, and we will update this blog post when more possible solutions become available.

Comments

“And thus Chrome used WebKit, and pretended to be Safari, and WebKit pretended to be KHTML, and KHTML pretended to be Gecko, and all browsers pretended to be Mozilla, and Chrome called itself Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 Safari/525.13, and the user agent string was a complete mess, and near useless, and everyone pretended to be everyone else, and confusion abounded.”

Jody,
Thank you very much for this post. We had a project on the log to add “Vary” headers, but I guess it will not happen as we’re using Akamai as well.
I hope Google can come up with a better solution than this. Wondering if a preference within WMT would work?

I am bit intrigued by the article. If Google’s recommended configuration for smartphone-optimized sites is to use responsive web design, why would you even recommend “Dynamically serving different HTML on the same URL” as best practice?

Just to clarify, with the previous comment what I actually meant to say is, why don’t you recommend responsive web design as a best practice, rather than even going into the troubles of serving different content on the same URL.

@TraiaN – There are times when the client isn’t ready or willing to do responsive design. E-Commerce, in particular, can be very difficult with responsive design. We try to push for the blue-sky recommendations, but often have to deal with the pragmatic limitations of time, effort and/or money.

@svolinsky – This would be a great solution for google, but I would rather see the search engines work with a public solution so we SEO’s don’t have to have multiple solutions to make something work.

So for now, if we’ve implemented all other aspects of Google’s recommended set up (two-way “bidirectional” annotation), would you agree that it’s best to just leave the Vary: User-Agent HTTP header out of the mix?

@mike – I would do that if you have someone using a CDN. So far, this is only an issue with those types of setups. Specifically, we haven’t tested outside of Akamai. If you are talking about a normal website setup without the CDN, we haven’t seen this same issue.

I sent a question and Matt Cutts answered they still recommend to use the HTTP Vary header in theses cases, even if Akamai or other CDNs don’t cache the URLs without it (as he states there are other ways to cache..)

@Christian Oliver – I agree, it is rather a problem. We did have a client who fully implemented it with Akami (prior to us understanding the problem) and their webserver became overloaded from the number of requests coming back from Akami. While I understand what Matt Cutts is trying to say, implementing it with Akami will incur site latency and web server traffic you were hoping to avoid.

Anyway, it seems there is a way to solve this with Akamai, basically configuring Akamai to ignore the Vary: User-Agent header (so it continues to cache your webpages) BUT keep it to send it to the clients. Guy Podjarny, from Akamai, explains it here:

What are people’s thoughts on just creating this Vary: User-Agent header attribute for googlebot only? (IF Googlebot, then add header attribute). I’ve added this header attribute, but only really to help Google. I’m not using Akamai, but am concerned my own server will be overloaded with ISP requests (if I’m understanding the issue correctly). If the header is for Googlebot only, I help Google’s understanding of my site without the extra traffic from other sources.

The thread Christian is talking about has some very granular detail from Akami that not only gives an explanation about the issue and a possible fix. If anyone gives the solution a whirl, please let us know how it turns out!

@Andrew – I am not sure the header is only for Google. Google says it uses it as a signal to better understand your mobile configuration, but their are caching servers that also use this information. I do like the idea of using it as a signal to google if you are running into caching problems because of the implementation.

I’m a nit late o commenting this post, but the original problem were on every resource or only on web pages? The Vary header should be set ONLY on text/html mime type, not on every resources. Of course it strictly depends on how the CND is working.

But if the CND is used only to cache the static resources (js, css, images, …) and not the html adding the Vary header only on that mime type should solve the problem.

[...] in order to successfully understand and deploy any of these approaches. And beyond that, it takes technical SEO to understand when a Google best practice could potentially create a sub-optimal user experience for your [...]

[...] in order to successfully understand and deploy any of these approaches. And beyond that, it takes technical SEO to understand when a Google best practice could potentially create a sub-optimal user experience for your [...]

ABOUT THIS BLOG

The RKGBlog is a continuing discussion of online marketing written by the employees of RKG.