Husband | Father | Analytics developersimo (at) simoahava.com

Introducing GA Spy For Google Analytics

_This is a guest post by Stephen Harris from Seer Interactive
. He was kind enough to share his awesome solution in this blog, so I’m very grateful indeed for his contribution._

If Google Tag Manager is loaded as the primary instrument for tracking on a webpage (as it should be), then all webpage tracking could and should be configurable via GTM. But we don’t always control the circumstances, and it’s not uncommon to face hardcoded Google Analytics tracking outside of GTM.

Perhaps GA tracking cannot be removed from the website source code quickly enough. Maybe the website is on a publishing schedule that doesn’t suit measurement needs, or maybe the folks responsible for the website don’t have sufficient resources. Whatever the reason, you got a “no” when you requested insisted on removal. (You did insist on removal, didn’t you?)

Or what about when there are hardcoded GA commands in a platform or plugin? It could be screwing with your data. But chances are it’s not doing anything at all (because it’s tracking to the nonexistent default GA tracker) and there seems to be no way to get it to work through GTM.

And even when all tracking runs through Google Tag Manager, some things can be difficult (or impossible) to implement using the tag templates (such as adding GA plugins, or defining Universal Analytics tracker field defaults).

1. Spying On GA

You can grab the code on GitHub. Copy-paste it either into a Custom HTML Tag in Google Tag Manager, or load it in the page template proper.

It’s called GA Spy because it can silently hijack and control all tracking that relies on Google’s Universal Analytics library (analytics.js). Effectively, this is a Google Analytics listener that does karate.

To put this in layman’s terms, the script listens for interactions with the ga() interface, returning the arguments passed to the function for you to process for whatever purpose you want. For example, you can automatically copy all calls to ga('send'...) to dataLayer.push() syntax, allowing you to fluently replicate hardcoded Universal Analytics tracking in Google Tag Manager.

This blog post is written assuming the hardcoded tracking is for Universal Analytics, but there is a version of the script for the async GA library (ga.js), to which the same core concepts apply.

1.1. Spy Code 101

Let’s walk through the script logic. Before implementing, it’s important to get a basic idea of what this script actually does. Once you understand the basics, you’ll also understand why this should be considered a temporary solution. Indeed, you should document the use of this solution, so that when the hardcoded Universal Analytics code is eventually removed from the site, you won’t be left hanging dry with methods that don’t really do anything.

This section also explains this script’s biggest caveat: GA Spy is not guaranteed to intercept (or even detect) tracking that fires while the page is loading (e.g. the standard GA snippet) if the hack is deployed via GTM. Capturing these requests requires that you deploy the script on the page template itself, which unfortunately might detract from the usefulness for those without developer resources handy.

1.2. A Good Spy Starts With Research

Let’s start with an overview of how the Universal Analytics tracker works on the page.

Many GA users are familiar with the JavaScript namespace ga. It’s the global object for tracking with the Universal Analytics library (analytics.js). When the browser loads the page, this object is actually processed twice. First, it’s created by the Universal Analytics snippet, and it queues all commands to ga() while waiting for the library to load. Then, once the library has loaded, the namespace is converted to a full interface which processes each command as it is executed.

So the standard Universal Analytics snippet instantiates ga as a tiny function that saves all the arguments you pass to it as an Array in ga.q. This is called the command queue. It’s created by the second line of the GA install snippet function, shown here (beautified with more readable variable names):

Once this queue is established, the snippet creates a script loader, which starts an asynchronous request to download the analytics.js library from Google’s servers.

Once analytics.js loads, it will run the commands in the command queue and replace the queue function ga with much more robust object, which we call the ga object or function. Once this happens, ga stops keeping track of previously called commands, and will run commands as soon as they are called.

1.3. Spy Moves

The core of this script is the hijack function. What it does is simple: it saves a private reference to the global ga object, then updates the global reference (window.ga or simply ga) to point to a custom function, which we’ll call our “proxy” function. Note that if you’ve chosen to rename the global function as something other than ga, it will still work nicely with GA Spy.

The proxy function mimics the data members and methods of the original ga function. Thus it behaves like the original, which we accomplish by passing its arguments back to the ga function (and returning the result). But, only if we so choose. We can reject the message, simply by not sending the message to the original ga function, and thus intercept and block the hardcoded GA from running its requests altogether.

1.4. Time To Spy

As soon as GA Spy is run, it will instantiate the command queue if it doesn’t already exist, just as the Universal Analytics snippet would. Both in this script and in the Universal Analytics snippet, we’re careful not to blindly set a new variable and lose any previous data.

Note: the above code does the exact same thing as the second line of the GA install snippet function, shown beautified under A Good Spy Starts With Research above.

Then, regardless of whether it already existed or was just initiated, GA Spy hijacks ga.

If ga is found to be the fully-formed Universal Analytics tracking method, then analytics.js has already loaded and we will be tapped into all subsequent calls to ga until the page is unloaded. Job done. Since the global ga object does not keep track of previously run ga() commands, we cannot access the information passed in those commands. It’d be useful if we could access the command history, but we still would not be able to block those hits, since JavaScript does not support time reversal (not gonna work: setTimeout( hijack, -5 )).

However, if GA Spy finds the command queue instantiated in ga, then we know that if/when analytics.js loads, it will replace the hijacked command queue (our proxy) with the actual Universal Analytics tracking method. We can easily hijack it again, but timing is important. We want the global ga object to be in existence for as short a period as possible to avoid missing any commands before we manage to hijack it.

Any function in the command queue will be executed as soon as the ga object is ready. So we add the hijack function, thereby setting a trap the ga object will trigger immediately upon loading, and put our proxy in its place before the ga object is even available to other scripts.

The only other thing we need to do is run the stack of existing commands through the listener, and when it returns false just filter that command out of the queue.

2. Reliability And Caveats

This should be an extremely reliable solution. In fact, the method of listening by “hijacking” a function is used by a number of Google libraries, including GTM’s own Data Layer and Autotrack. Note that it’s possible GA Spy can be broken by an update to analytics.js, but the ga interface GA Spy relies on is wholly defined in the standard Universal Analytics install snippet, so breakage due to unannounced library updates is highly unlikely.

Nevertheless, there are some important caveats to note before using this.

2.1. Uncommon Behavior For A Common Library

Whenever this is used to modify or block a ga command, I would consider this a “hacky” solution. Although the code is sound, such usage modifies the typical behavior of the global ga() function, making it work differently than it works on 99% of other websites. This can impede troubleshooting and confuse those who are new to Google Analytics or JavaScript.

Tip: You can tell that you’re dealing with hijacked GA by checking for the presence of the property _gaOrig in the global Google Analytics object. By default it would be: window.ga._gaOrig.

In some cases, hijacking the GA function is necessitated by third-party vendors that require their JavaScript to be implemented with a hardcoded GA tracker.

In other cases, such as when completing a migration from hardcoded Google Analytics to Google Tag Manager, this treatment should be acceptable only as a short-term solution.

2.2. Cannot Intercept Hardcoded Pageviews From GA

If GA Spy runs before analytics.js loads, you will be able to access and block 100% of the commands queued to analytics.js. But if GA Spy runs after, then it has zero insight into what hits (if any) were previously fired. So if you want to intercept all hardcoded commands, GA Spy needs to be deployed directly on the page. However, for many purposes, this is not necessary at all.

Unfortunately, in some cases where deploying on-page is necessary, the inability to make on-page changes is the very problem that prompted the need for GA Spy in the first place, making this solution a catch-22. For GTM migrations, one way to mitigate this is to request to have GA Spy placed on the page at the same time that GTM is added.

Note, if synchronous loading of Google Tag Manager is ever supported, hardcoding GA Spy will no longer be required, but still recommended, because loading anything synchronously in the <head> will degrade page loading speed.

2.3. Incomplete

Currently, GA Spy only intercepts GA commands that are called upon the global ga object, but there are other ways to send commands to analytics.js. Many GA plugins and some custom implementations call commands directly upon the tracker object (i.e. ga.getAll()[0].send('event')). Support for intercepting these commands may be added in the future.

Another limitation is that GA Spy can process only the values passed to GA commands. It doesn’t provide access to the default values for all the fields that are populated by analytics.js.

One effect of these two facts is that GA Spy will not pick up hit modifications (or extra hits) done by GA plugins (and other implementations using the Tasks API). This is by design and no support for this planned. (If you see a need for this, describe the use case and upvote this enhancement here.)

2.4. More Code!

As a developer I have many protests against the following statement, but I can’t deny it’s essential truth: code is bad. Meaning, no matter how good the code, no code is better. Code leads to bugs which lead to more code, which leads to more bugs. Even bullet-proof code requires some degree of maintenance. Perhaps the biggest issue is that a code-based solution imposes a higher skill barrier on managing and customizing the solution. As a general rule, we should avoid custom code whenever there is a sound alternative.

3. Usage

3.1. Placement

If you need to intercept GA commands that run upon page load or shortly thereafter (such as the tracker creation or 'pageview' in the base GA install snippet), then you’ll need deploy GA Spy in an inline script tag or external script (with no async or defer attributes) above the first GA command (usually in the base GA snippet).

Block the hit by returning false (returning merely “falsy” values like 0 or undefined will not block the hit).

And everything else you can do in JavaScript :)

In many cases, we’ll want to push the hit values onto the dataLayer so they can be accessed in GTM.

<script>
gaSpy(function(obj) {
// Do something with the arguments to ga():
varargs=obj.args;
// Do something with details about the tracker itself:
vardetails=obj.the;
});
</script>

Naturally, any calls to gaSpy() need to be timed so that they take place after you’ve loaded the GA Spy code itself.

See the README file for more details on what data is passed to the callback function, as well as additional GA Spy configuration options.

3.3. Listening From GTM

If using GA Spy in GTM, the listener should typically ignore all commands from GTM. Failure to do this could result in blocking GTM hits or in an infinite loop. We can do this by checking the tracker name. GTM uses tracker names starting with “gtm” followed by the timestamp of when the tracker was created.

However, this method will not work when callbacks are passed in place of a GA command. GTM does not appear to use these currently, but we cannot do much with a them anyway, so as a precaution any listener should ignore callbacks too.

Use the this code at the top of your callback function to avoid issues:

This is not foolproof, but it’ll work unless hardcoded tracking is using a tracker name starting with “gtm” (or if the tracker names in GTM are customized, which is probably done as an alternative to this solution). For example, this method will not work for Wistia’s built-in tracking, because Wistia’s code (erroneously) sends hits through every named tracker on the page.

4. Examples

4.1. Log GA Commands

Logging GA commands is a useful way to easily see what commands are being picked up by GA Spy, letting you see which things GA Spy can block and/or latch on to in order to execute custom behavior. Compared with browser extensions and the verbose analytics_debug.js, GA Spy logging can be very minimalistic. You could even log using only emojis if you wanted! And it has the advantage of working even when the hit does not fire (such as when using GA opt-out or tracking blockers).

This listener callback will print ga() arguments exactly as they are given:

Since this is all one script, you could turn this into a bookmarklet. This could also block all hits, so it serves as a ‘test mode’ bookmarklet, which both logs and prevents any hits from being fired.

4.2. Access Hardcoded GA Command Data Via GTM

If you know exactly what you’re looking for, you can simply look for that format and send it to dataLayer. Remember to define an event name in the dataLayer.push(), so you can create a GTM trigger based on the event. Here’s an example that clocks hardcoded events from the social sharing plugin ShareThis and forwards them to GTM:

Most of the code necessary for this simply normalizes the command arguments (since GA commands have a flexible format for defining field values). It also handles data scope (different trackers, plugins) and also wipes hit-only fields after the relevant GTM event.

5. Next Steps

Even though this is a reliable and relatively simple method for working with hardcoded GA tracking, hopefully this was not your first choice. Even though this fixes real data issues and helps consolidate your implementation in GTM, it’s an awkward way of doing so. You can get the same result with a much simpler configuration: no hardcoded tracking, and no spying! Don’t be satisfied with getting tracking working; pursue a clean implementation. Even if it will take a while, plan to have the hardcoded tracking removed.

Your script is making your Universal Analytics code work differently than it would work on virtually every other website, confusing troubleshooters and learners. So, even if you believe hardcoded tracking removal is imminent, be sure to note your use of GA Spy (prominently) in your tracking documentation.

If you are using GA Spy to deal with code from a third party platform or plugin, contact the developers. Let them know their script is not compatible with GTM. (Actually, you should do that before implementing GA Spy; they might be responsive and fix their code quickly!)

6. Feedback

This method was designed to be flexible and was tested with various setups, but that’s not to say it’s bulletproof. On the contrary, I’m eager to discover bugs, incompatibilities with browser plugins, and similar issues. Naturally, I admit it’d be nicer to find out there are none :)

Please post any problems or suggestions as a new issue on GitHub if it has not already been added. Be sure to upvote fixes and enhancements you want to see implemented (by clicking the little thumbs-up icon)!

7. Summary (by Simo)

This is some top-notch JavaScript right here! What Stephen has built is a swiss-army knife that lets you take full control over how analytics.js functions as a tracking interface on your website. Even though translating the hardcoded ga() calls to dataLayer.push() commands is the obvious choice, there are lots of use cases for this library, and you can check some of the examples out here.

One of the things this enables is what we’ve been waiting for so long with Google Tag Manager: the ability to intercept and modify the payload sent by GTM’s Tags to Google Analytics. There’s no way to add your own custom plugins, for example, but this library with its hijack function lets you modify the tracker object between its creation and when the data is dispatched. It’s not entirely trivial, but I might just have an article in the pipeline showing some cool uses cases for this particular library, so stay tuned!

In any case, thank you so much to Stephen for sharing this incredible resource with the community. If you have implementation trouble with the library or any other type of feedback to share, please sound off in the comments. And please respect Stephen’s wishes when he requested any actual bugs and issues to be contributed to the issue tracker in GitHub.