Google Analytics is mis-reporting Yahoo! Slurp as Firefox OS

Google Analytics is a great platform, but it’s user agent detection can be a bit strange. New user agents aren’t back ported, so when support is added for a new browser or OS the stats tend to change dramatically going forward. An example from earlier this year is when I tracked down what Google Analytics meant by the browser “Safari (in-app)”.
More recently I ran through my analytics for May 2013 and saw a Operating System called “Firefox OS“. This is the much-hyped offering from Mozilla and while I am excited to test it out and perhaps develop for it, there are not yet any shipping devices. It’s strange to see traffic for an OS that doesn’t have any real users (especially the amount I saw), so I dug in a bit to see what Google Analytics was doing.

Firefox OS’s user agent

Firefox OS uses a user agent that follows the mold they have been sticking with for years and at launch will be (where X is the version number):

Mozilla/5.0 (Mobile; rv:X.0) Gecko/14.0 Firefox/X.0

The user agent is not in my logs

I grepped through my logs and couldn’t find anything close to a match despite Google suggesting I receive about 2,000 visits a day with Firefox OS. This confirmed for me that Google Analytics and I disagree with what Firefox OS is.

The plot thickens

Using secondary dimensions in Google Analytics I was able to find that the visits were all from the US, the screen resolutions were all 1024×768, Flash was supported (version 11.2 r202), no OS version was available and most interestingly that the network domains all resolved back to yahoo.com. I had previously tracked down some weird issues with Yahoo’s bot showing up in Google Analytics and this was starting to look quite similar.

Google Analytics doesn’t give you any raw data (IP addresses, full user agent, timestamps, etc) so it’s hard to line up with server logs. Using landing pages I was able to find some odd balls and find their corresponding log entries to see if I could find the exact user agent strings. I found the culprit:

The new Yahoo! bot is executing the pages

Curiously the new bot is sophisticated enough to execute the pages (load external resources, run Javascript, register ad impressions, etc). This is probably the reason that Google Analytics is seeing it at all, but also seems pretty rude. I don’t want to charge advertisers for fake impressions, but Yahoo! is making that pretty difficult by using a harmful user-agent and going through the effort to load remote ads. What’s worse is Slurp doesn’t support expires headers so each impression means it downloads all resources (images, stylesheets, javascript). A huge waste. The Googlebot is savvy about this sort of thing. Considering the scale of Slurp it seems like respecting expires headers could save millions of dollars in a very short order–they have wasted gigabytes and unknown gobs of CPU time just on my site. That said, since Bing powers Yahoo’s search results, I am not sure what this bot even does these days.

I was experiencing the same issue with Yahoo Slurp being tracked as FF 3.5. It was inflating my daily visitors by about 400. After 3-months, it finally stopped 4-days ago. I’ll have to dig into my server logs to see if Yahoo has discontinued the practice, or if GA has fixed the problem.

I had been seeing this on a few sites but noticed today that it seems to have stopped completely over the weekend. Visits from Firefox OS were down significantly on Sat 27 June and I have none sinceSun 28 Jun. I haven’t found an explanation yet, but it seems like either Google Analytics has stopped including this traffic or the bot has been shut down. Anyone able to confirm that this traffic has stopped?