This is the first of what I hope to make a regular series on Monday mornings: Q&A. Send in your questions throughout the week, and I’ll pick one to answer on Monday and get your week off to a great start! This first question came in over the weekend and relates to a topics that has certainly been written about before, however I thought I’d write my own response to this question rather than simply referring out to a help article on Google or other blogs so I can introduce in a bit of my own experience on the matter and a slightly different, and I think improved, take on the filter for the job.

The Question:

I am new to this GA but have worked with WebTrends in the past. I’ve implemented the GA code on the site I’m responsible portion, which runs on a sub-domain of our main domain [sub.domain.co.uk]. The page hits are appearing fine in Content Reports without the domain name [i.e. “page.html”], but the domain name is not shown properly in the URL reported within Google Analytics. When clicking the “view page” link the reported page is our main domain [www.domain.co.uk].

How do I get the sub-domain name to show in the pages listed in the Content Reports?

The Answer:

What you’re seeing is standard given how GA works. The “page” is the potion of a URL after the domain name. To get the domain into the content reports you need a filter that copies the hostname field into the Request URI field in Google Analytics.

First, I’d verify what hostnames are being recorded by GA: go to Visitors > Network Properties > Hostnames. Here’s the hostnames report for my site:

Caleb’s tip about hostnames in Google Analytics

note how the hostnames report contains “www.analyticspros.com” and “analyticspros.com”. Why is that? Well, it is because some visits came in while browsing http://analyticspros.com/ while most came in from users browsing using https://www.analyticspros.com/. This subtle difference is actually not so subtle. Technically, “www.” is a sub-domain. However, since this has been around as long as, oh, the World Wide Web, it is basically treated as the de-factor standard part of a domain name. Google Analytics treats “www” and “no-www” domains as the same insofar as the cookies it sets, however the hostnames are still reported as-is, www included or not.

This presents a problem for filtering the hostname into your content reports: you’ll introduce fragmentation into the reported content. The homepage could be “/analyticspros.com/index.php” or it could be “/www.analyticspros.com/index.php” since some visits used the non-www and others the www versions of the domain. The solution to this problem is included in my version of the hostname-to-URI filter. Note: there are SEO implications to hostname issues as well, and there’s an easy fix. Check out this article from SEOmoz.org on the topic of domain redirection and canonicalization (see the part about “redirecting canonical hostname” towards the bottom).

The perfect Hostname to Request URI filter

To get the domain (or “hostname” as its referred to in Google Analytics) showing in the Top Content and other content reports, you can create a filter to copy the hostname field into the Request URI field like this:

Filter Type: Custom Filter

Type: Advanced

Field A: Request URI

Field A pattern: (.*)

Field B: Hostname

Field B pattern: (^www.)?(.*)

Output field: Request URI

Output Pattern: /$B2$A1

Field A required: yes

Field B required: yes

Override Output: yes

Notes about this Google Analytics content filter:

The leading slash before the “hostname” output pattern is essential. If you leave this off, Content reports will have no leading slash. The leading slash is important for certain reports within Google Analytics to operate correctly.

In Field B pattern the “(^www.)?” part in means “find the hostname that begins with ‘www’, if it exists, and put it in memory slot B1. The second part “(.*)” means “find everything after the ‘www’, if it’s even there, and put it into memory slot B2″. The result of this is is that hostnames with or without the “www” will be reported as the same thing: without www in your content reports.

Since this filter modifies the URL reported, it will render the Site Overlay report unusable, as well as the “view this page” options under Content reports. The reason for this is that both of these features rely on an implicit understanding of your website’s location and the content you’re veiwing. Site Overlay works by comparing links in the HTML of your page and URL’s reported in your Content Reports. Unless there is an exact match, it won’t work. Since you’re modifying the URL reported by GA, that match can’t happen when you’re using this filter. For the “view this link” clicks GA relies on the domain name entered in your profile configuration section and the URL in your content report, thus when you copy the hostname field into the URL the view links won’t work because there is inherently no match.