The Ill-Defined Problem Of Attribution

For the past few years, I’ve sat on the board of a company that audits audience for various publications. One of the challenges the entire audience measurement industry has faced is the
explosion of channels traditional publishers have been forced to use. It’s one thing to tally up the audience of a single newspaper, magazine or radio station. It’s quite another to try to
get an aggregate view of an audience of publishers that, in addition to their magazines, have a website, several blogs, various email newsletters, a full slate of webinars, a YouTube channel, multiple
Twitter accounts, Facebook pages, other social destinations, digital versions of magazines and an ever-growing collection of tablet and smartphone apps. Consider, for instance, how you would estimate
the size of MediaPost’s total audience.

The problem, one quickly realizes, is how you find a common denominator across all these various points of audience engagement. It’s the
classic “apples and oranges” challenge, multiplied several times over.

This is the opposite side of the attribution problem. How do you attribute value, whether it’s in terms
of persuading a single prospect, or the degree of engagement across an entire audience, when there are so many variables at play?

Usually, when you talk about attribution, someone in the room
volunteers that the answer to the problem can be found by coming up with the right algorithm, with the usual caveat something like this: “I don’t know how to do it, but I’m sure
someone far smarter than I could figure it out.” The assumption is that if the data is there, there should be a solution hiding in there somewhere.

No disrespect to these hypothetical
“smart” data-crunchers out there, but I believe there is a fundamental flaw in that assumption. The problem behind that assumption is that we’re accepting the problem as a
“well defined” one - when in fact it’s an “ill-defined” problem.

We would like to believe that this is a solvable problem that could be reduced to a simplified
and predictable model. This is especially true for media buyers (who use the audience measurement services) and marketers (who would like to find a usable attribution model). The right model, driven
by the right algorithm, would make everyone’s job much easier. So, let’s quit complaining and just hire one of those really smart people to figure it out!

However, if we’re
talking about an ill-defined problem, as I believe we are, then we have a significantly bigger challenge. Ill-defined problems defy clear solutions because of their complexity and unpredictability.
They usually involve human elements impossible to account for. They are nuanced and “grey” as opposed to clear-cut “blacks and white.” If you try to capture an ill-defined
problem in a model, you are forced to make heuristic assumptions that may be based on extraneous noise rather than true signals. This can lead to “overfitting.”

Let me give you an example. Let’s take that essential human goal: finding a life partner. Our task is to build an
attribution model for successful courtship. Let us assume that we met our own livelong love in a bar. We would assume, then, that bars should have a relatively generous attribution of value in the
partnership “conversion” funnel. But we’re ignoring all the “ill-defined” variables that went into that single conversion event: our current availability, the
availability of the prospect, our moods, our level of intoxication, the friends we were with, the song that happened to be playing, the time of night, the necessity to get up early the next morning to
go to work, etc.

In any human activity, the list of variables that must be considered to truly “define” the problem quickly becomes impossible. If we assume that bars are good
places to find a partner, we must simplify to the point of “over-fitting.” It may turn out that a grocery store, ATM or dentist’s waiting room would have served the purpose
equally well.

Of course, you could take a purely statistical view, based on backwards-looking data. For example, we could say that of all couples, 23.7% of them met in bars. That may give us
some very high level indications of “what” is happening, but it does little to help us understand the “why” of those numbers. Why do bars act as a good meeting ground?

In the end, audience measurement and attribution, being ill-defined problems, may end up as rough approximations at best. And that’s OK. It’s better than nothing. But I feel it’s
only fair to warn those who believe there’s a “smarter” whiz out there who can figure all this out: Human nature is notoriously tough to predict.

Great post, Gord. On RKG Blog I made a similar observation on the limits of statistical methods when it comes to attribution. http://www.rimmkaufman.com/blog/attribution-myths-vs-reality-part-1-statistical-limits/09072013/
Part of the scientific method is understanding the limits of what you can do with the tools available. Snake oil salesmen have had the cure for the common cold for centuries, it's only the scientists who've struggled with that problem.