When you use auto-tagging with your Adwords campaign, all request that are generated by Google Adwords contain a ?glcid parameter in the Request. Adwords uses this to pass some information to Analytics for traffic analysis.

I was curious, about what data the gclid parameter contained. My guess was that it contained some encoded or encrypted information regarding the origin of the click, so I did some analysis on the clicks that I received. Some discussion about it was available on this post.

I ended up writing a quick PHP script that parses through an Apache log file. It finds requests that contain a gclid and then produces a report of which letters occur in which positions of the gclid.

The script is available for download here, and it generates a report like this:

This makes it clear that the parameter has some structure, but I’m still no closer to determining what it contains. Counting up the unique values, it would seem that they have about 95 bits of information available, which might be enough room to store everything it would need to know about the search that created it. Based on the reporting details in Analytics, I would presume that it somehow contains at least the following information:

Campaign (id)

Keyword (id)

Ad Variation (id)

Position

I did some research by clicking an ad multiple times and examining the glcids for those:

I noticed that most of the characters which use 32-64 characters vary quite a bit except for character #9, which was always an 8, and character #10 which was a ‘p’ for the first two clicks, and then a ‘5’ for all subsequent clicks. That likely has some significance, but I’m out of time for playing with it for now.

Hopefully the script and this basic analysis might be of use for somebody else to use in digging into it further.

One other thought that I had is that the data (or each field) is somehow encrypted and when you ‘link’ your Analytics account to your Adwords account it shares the decryption key so that it can get at the detail.

“The “(stuff)” that is added appears to be unique for each advert impression, and appears to be unique in a clever way… The first part of the ID varies rapidly and the last part varies slowly. This is clever because when you are looking for string matches, you get an early failure in the string match, helping to speed the search up – an indication that some smart people may have been working on this.”

“I’ll guess that the last part of the gclid value encodes, or more likely references in some way, the advertiser ID, the keyword, adgroup, campaign and account ID’s. The first part, that changes rapidly, is probably some combination of timestamp and instance ID or advertising channel (where the advert was published). I suspect that the account and keyword part is a database ID that delivers a row with the account ID, campaign and so on – rather than being an encoding. I suspect that the first part is a timestamp and instance ID, which will also be recorded on Google servers and will tell them when the advert impression was delivered, on which site and how long it was between that impression and the click.”

Yeah, I read all of those and then found that I had wasted a couple hours without really accomplishing anything. I’m pretty certain that there is some interesting information contained in there, but since this is their main method of generating revenue, it is likely very well thought out and well enough encoded that I will never be able to extract any useful information (although that click fraud detector idea might be useful if it pans out).