Tag: google-analytics

Over the last couple of months we have been implementing AMP on many of our site’s. I’m not going to discuss pro’s and con’s of AMP, nor the wider trend of dis-intermediation of content off site to across various platforms and ecosystems. Instead this is a more practical little (but potentially a lot bigger) pitfall we have noticed in how Google Analytics handles this traffic.

The story begins when our main anomaly detection system (Anodot) recently started picking up spikes and increasing referral traffic from “cdn.ampproject.org”.

As we looked into this we could see the following example source, medium and referral path information in Google Analytics:

So it seems like the mysterious new referrer “cdn.ampproject.org” is actually just a result of users clicking through to our sites from content that happens to be hosted on AMP.

In the examples above we can see that in the first case its actually not really a referral as such but just the result of someone clicking through from one of our own AMP pages.

The other two examples are indeed referrals but the traffic source of “cdn.ampproject.org” misses the fact that actually these are referrals from specific third parties (refinery29.com and gizmodo.com).

So the fact that this traffic really came from different places, and that “cdn.ampproject.org” is not really a proper referrer in a traditional sense, is now hidden in the referral path in Google Analytics. So all out of the box reports and dashboards that typically revolve around the traffic source field in GA will miss this new complication.

This is not really a bug in Google Analytics, it’s more of a potential unintended consequence of the way AMP works.

In a world where all publishers 100% use AMP then you can imagine how bad this could get with “cdn.ampproject.org” in GA becoming one of your main sources, hiding the fact that beneath this is a much more complicated sea of actual third parties who linked to your content.

Potential Work Arounds

We will probably create a new field in our data mart that reads the true source from the referral path and so overwrites “cdn.ampproject.org” with the ‘proper’ source. This would not however fix things on the front end of GA for business users and would would fix our internal downstream reporting that builds on the backend raw data we have as a 360 customer.

Another option that would surface a fix to the front end could be the use of a custom dimension to house this cleaned version of the traffic source. The downside here however is that using this would require creating custom reports anywhere you wanted to use the cleaned traffic source.

There may be other word around’s i’ve missed here – if you think of any please add them into the comments.

We have reached out to Google to point this out to them. As it’s not a bug i’m not sure if or how they might deal with this, is a tricky one for sure as implementing some sort of override for AMP traffic might be a little too ad hoc. There may very well be other use cases where the current behavior is exactly what someone wants. As a publisher though i can’t really see any from our point of view.

Anyway, if you use GA to understand your web traffic go check it out for yourself to see if you see the same thing. Feel free to share your story in the comments below as we are keen to hear other’s affected by this.