In looking over how little web analytic tools have changed over the last few years, I’ve been struck by one of the things that have held the tools back: the adherance, not (just) to the page view, but to:

The Session.

Aka the visit, the session has caused no end of explanations. It’s one of those things, like unique visitors, that create confusion to non-analytic folks. I always dread that part of the conversation where we explain that the “30 minute timeout” doesn’t mean that we chop sessions off after 30 minutes.

When you think about “processing” your own logs or tracking, what’s the first compute intensive task that comes to mind? “Sessionizing”. (Counting Unique Users gets partial credit, but that’s something databases do pretty well on their own, no special coding required).

In fact, I even start to wonder if it’s really relevant these days, at least the way we currently define it. It made sense back when we started all this stuff, but perhaps our definition, like so many things, needs some modernization to reflect current needs.

Define a Session
First, let’s all get on the same page (view. Hah!). Every web analytics tool these days follows the WAA definition, found in this PDF of Web Analytic Definitions:

A visit is an interaction, by an individual, with a web site consisting of one or more requests for an analyst-definable unit of content (i.e. “page view”). If an individual has not taken another action (typically additional page views) on the site within a specified time period, the visit session will terminate.

As you all know, we usually use 30 minutes as a timeout. (For those newbies out there, basically, we stitch all the requests together by some ID (cookie, whatever) and “walk along them”, starting a session with the first time we see the ID, and when we don’t see any activity from the ID for 30 minutes, we “close” the session. If the user pokes the site in a measurable way every 29 minutes for a day, we can have a 24 hour session totally legit.)

The Problem with Sessions
What’s the problem? A session is the wrong level of data tracking. Just like Page Views are what the tools want but we are all moving to Events, Sessions roll up lots of really important information (and by roll up, I mean either don’t track, or make it near impossible to get out of the tool).

Back in the day, when session were invented in a world where there was no Facebook or Twitter, the average website visit or session looked something like this:

We might see that most users are spending 30 seconds or less on each page, and a site visit may be 7-10 minutes in total; some a bunch longer, most shorter. But, it’s all pretty contiguous: the user enters, they read or shop or play a game or whatever, and then they leave.

But anyone looking at their data these days might see the following session:

What happened in this second session? Why did we have 7 minutes on a page, when we are used to much less? It’s hard to tell. You see, in that 2nd visit, our tool sessionized the entire thing, so we have what appears to be a user spending 7 minutes on a page that others spend usually 30 seconds on… and then converting with a coupon code from an affiliate that wasn’t present as part of the session campaign tracking. The session was attributed to the SEM vendor; why don’t we see the click from the affiliate? They actually closed the sale, but they aren’t getting credited. In some ways, this is the opposite of “last click” attribution: It’s “first click in the session” attribution!

So, we dig in with our web analytics tool. Did this user leave the site during that time, or open another tab to hunt around? Well, I can’t tell: since it was sessionized, most every tool throws away external referrers that are not part of a session initiation. That is, if it’s not the first page in a session, the tool doesn’t keep the referrer, even if it’s external. When its in the middle of a session, the referrer is, by our session definition, the previous page in the session… even when we kind of know it’s not.

Now, when I go to my logs (as sports fans know it, “Let’s go to the tape!”), it becomes more obvious what happened. User saw a product page, then popped a new tab/browser, hunted for a better price, didn’t find it (we rock!). They then hunted for a coupon code on an affiliate (which we recognize as one that does lots of good SEM/SEO work, hence the suspicion that the user was searching) , clicked on that link… and since that was less than 30 minutes of dead time, that marketing query string was just tossed into the middle of the session, the tool plops it into the middle of the previously open session, we lose the referrer and potentially the query string… and we miss what really happened.

Session-Absorption
Most tools will throw away that mid-session query string and other mid-session data, unless we take special efforts to tuck it in somewhere. The campaign code query string won’t show up in the usual marketing reporting places, for example. We can make it an event in GA or Coremetrics, or dedicate a “track em’ all!” eVar in Omniture, but at the end of the day, the tools mostly assume a session has one source, and it’s on the first request of the session.

(Now, to be fair, Omniture does have a workaround of sorts for their conversion variables (including s.campaigns): http://blogs.omniture.com/2008/08/19/conversion-variables-part-ii/ points out that in the admin console, you can set First, Last, and Linear allocation in-session approaches. That’s pretty good… but it doesn’t solve the issues with referrers, nor other “in-session” questions like order effects of which marketing came first, etc. And, if you want to have more than one of these types of attributions (to compare First- vs. Last-in-session, for example), you need to fork out more variables, and Omniture only uses s.campaigns in all it’s “fancy” marketing dashes. So, workaround, but not a solution, imho.).

Do Sessions Really Answer Our Business Questions?
In fact, sessions kind of peanut butter over lots of interesting behaviors when we realize that a) our sites are part of a larger internet, where users hop between sites at the drop of a mouse, and b) modern large sites are often made up of multiple business interests (merchandisers of toys vs. dresses, business units of B2B arm vs. B2C arm, etc.) and their “sub-sections” need to be treated like mini-sites on their own.

This leads to some tough questions to answer with modern tools. Some users don’t care about “the site”, they care about their section: what drives traffic there? How do external-to-deep-drops work for them? How about internal promotions? How can they track time-per-visit “in their section” (vs. total site visit)? Others pay for clicks, and they want to account for all of them: sessionizing can hide away the interaction of how these multiple influencers drive your business. Besides, the “view through” and earlier-in-the-funnel clicks, what about all those that happen in the session?

In fact, so many of the questions I get are intra-session questions: How do people use this “Section” of my site? Does “any one of these pages or types of pages drive certain behaviors”? Do people use multiple marketing channels in a session? Are people clicking into my site multiple times in a row from social as people point out different parts of my new products or new pages? How do I track internal promotions with the same powerful tools given for external marketing?

(If you got this far, I hope it’s clear that I’m not talking about last click attribution across sessions, I’m talking about multiple marketing drivers impacting the same session: people looking for pricing breaks by clicking on offers, clicking on coupons, trying aggregators, whatnot: they may “leave and come back” to your site 3-5 times in a session to see if a coupon code or cookie makes a difference. And guess what: that’s all 1 session to your tool.)

This session issue, btw, also affects reports like “exit pages”, “entry pages”, anything that is a by-product of the sessionizing experience. If we need to wait 30 minutes for a session to end, then an exit-and-then-an-entry within 20-30 minutes is still part of the same session… when we may really want to understand what’s going on here, the tool has already got it’s answer for us (hammer, meet nail).

So, if a user comes back and forth like that, hopping out to compare prices, coming back, then hopping out to find free shipping or discount codes, should we consider all of that behavior one session? Point, meet counterpoint:

Yes: to the consumer, it’s all one flow of their shopping experience, so we should track it that way to reflect their user experience. Also, it’s how it’s been done for over 10 years, and it’s how Google does it, so it must be right.

No: we munge up events, Page views, all sorts of things that consumers don’t see. Our goal as analysts is to optimize the experience, and if that means restructuring the data to get at the right analysis, so be it. This is why we now have Big Data systems: I should never have to ignore any data just because it’s easier to process it in a defective way.

So, yes, like every analyst, I want to solve the cross-session attribution problem. But I also spend time trying to explain why my affiliate vendor reports, on first glance, seem so much higher than what my web analytics tool would report based on campaign codes. Ignoring the usual slop of bad redirectors, yadda, yadda, we often find that it’s not just multiple channels that add up to drive online behaviors, but that they add up even in the same session... and my web analytics tool is not helping me understand this.

So, My Proposal
We can still do Visits, or Sessions and not lose history… but we should also “sub-sessionize” on arbitrary boundaries for the problems at hand. By arbitrary, I mean using boundaries that answer your business questions or your need for a deeper level of analysis.

Some options:

If a page has an external referrer, that immediately triggers a new “adjusted session”.

If a page has a campaign code/query string, that triggers a new “adjusted session”. The previous one can become an “assist” session in marketing tracking.

If a user goes from one “section/hierarchy/area” to another, that triggers a new “adjusted session”.

Yes, these “adjusted sessions” will be bigger counts than the usual sessions, and they will cause pv/session and other ratios to change. We aren’t throwing away the old stuff, so no worries, but but just consider these new metrics a different unit of analysis, and all those worries go away. You can now analyze and optimize processes in your site that sum up to the big picture, only you couldn’t see them under the old metrics.

The point is a commonly overlooked one: Web analytics isn’t about analyzing your web site. It’s about analyzing how your web site drives your goals. Don’t get stuck into trying to cram your business into how web analytics decided things would be 10 years ago. Think instead about how you can bend the tool to talk your business language, and around your needs.

Some benefits of “sub-sessions”: we can use the same technology and approaches to handle fractional attribution, pathing, etc, and understand what’s really driving conversions both across but now also within a session. If a bunch of my SEM budget is starting a session, but I’m also paying my affiliates in the same session, well, that just feels like a waste of money. And in most web analytic tools, I’d never see it (sure, I can use other tools to track this, but a web analytic tool seems like a good place to handle this, right?)

Is this happening to me?
How to tell? Besides going to your logs, one way is to set a tag/variable to always look for the campaign code info and tuck it into a page-level variable or event. Then count these up. If you get a different count here than you do from the various marketing-dashboard visit-level tools, then you know you have this “session-absorption” problem. If it’s pennies, don’t worry about it.

But I think you’ll find that customers are doing a lot of things inside your session that you wish you could break apart. In some ways, this is just a recursion: Take the same tech and approaches used for “last click, first click, any click” attribution across session and run it across these “adjusted sessions” to show how multiple channels are being used to “seal the deal”.

It’s the forest for the trees: while standing in “Website Meadow” in the center of the forest and staring out to the horizon to see how those “first touches” impact your customers way at the edge of the forest, you may be missing the fact that down at the treeline in this meadow, you have too many last touches munching away at your sandals.

P.S.
(BTW: I too thought that Omniture’s Linear Allocation or Participation concepts would solve these in-session problems, but they only apply to pages as of this writing, not other data types: http://blogs.omniture.com/2009/01/13/participation-inside-omniture-sitecatalyst/. And the Cross-Visit-Participation plugin is really for cross-visit attribution, not in-session, though it could be used here, with the same problems as mentioned for the allocation issues above. But excellent Omniture-Fu you had in thinking those would apply! Call me for a job!)

* * *

Thank you very much for your post, it makes us have more and more discs in our life, So kind for you, I also hope you will make more and more excellent post and let’s more and more talk, thank you very much, dear.