IP Anonymization in Analytics

A technical explanation of how Analytics anonymizes IP addresses

At a glance

When a customer of Analytics requests IP address anonymization, Analytics anonymizes the address as soon as technically feasible at the earliest possible stage of the collection network. The IP anonymization feature in Analytics sets the last octet of IPv4 user IP addresses and the last 80 bits of IPv6 addresses to zeros in memory shortly after being sent to the Analytics Collection Network. The full IP address is never written to disk in this case.

In depth

Since 25 May 2010, Analytics has provided the _anonymizelp feature in the ga.js JavaScript library (and more recently ga('set', 'anonymizeIp', true) in the analytics.js library) to allow website owners to request that all of their users' IP addresses are anonymized within the product. This feature is designed to help site owners comply with their own privacy policies or, in some countries, recommendations from local data protection authorities, which may prevent the storage of full IP address information. The IP anonymization/masking takes place as soon as data is received by the Analytics Collection Network, before any storage or processing takes place.

The process of IP anonymization in Analytics takes place within two steps in the collection pipeline: the JavaScript Tag and the Collection Network. These steps are explained below.

The Analytics JavaScript Tag

When a JavaScript-enabled web browser loads a page with the Analytics tag (ga.js or analytics.js), it does two things asynchronously: load and process the Analytics function queue and request the Analytics JavaScript. The function queue is a JavaScript array where the different Analytics configuration and collection functions are pushed. These functions, which are set by the site owner when implementing Analytics can include functions like specifying the Analytics account number and actually sending page view data to the Analytics Collection Network for processing.

When the Analytics JavaScript runs a function from the function queue that triggers data to be sent to the Analytics Collection Network (this function is typically ga('send', 'pageview') in the analytics.js JavaScript library and _trackPageview in the ga.js library), it sends the data as URL parameters attached to an HTTP request for http://www.google-analytics.com/_utm.gif (for ga.js) and http://www.google-analytics.com/collect (for analytics.js). If the anonymization function has been called prior to the page tracking function, an additional parameter is added to the pixel request. The IP anonymization parameter looks like this: &aip=1

The Analytics Collection Network

The Analytics Collection Network is the set of servers that provide two main services: the serving of ga.js and analytics.js (the Analytics JavaScript) and the collection of data sent via requests for _utm.gif and /collect.

When a request for ga.js, analytics.js, _utm.gif, or /collect arrives, it includes additional information in the HTTP request header (i.e. the type of browser being used) and the TCP/IP header (i.e. the IP address of the requester).

As soon as a request for _utm.gif arrives, it is held in memory for anonymization. If the &aip=1 parameter is found in the request URL (as it would have been placed by the Analytics JavaScript after processing the anonymization function in ga.js or analytics.js ), then the last octet of the user IP address is set to zero while still in memory. For example, an IP address of 12.214.31.144 would be changed to 12.214.31.0. (If the IP address is an IPv6 address, the last 80 of the 128 bits are set to zero.) Only after this anonymization process is the request written to disk for processing. If the IP anonymization method is used, then at no time is the full IP address written to disk as all anonymization happens in memory nearly instantaneously after the request has been received.