Heavy Lifting

Another eMetrics (Toronto) has passed and I have to say this: Web Analysts and Marketers proved once again they are up to the task of continuously improving the Productivity of their efforts!

At the same time, (and as I expressed during the sessions on the analytical culture), I fear that many in the web analyst community are becoming very “inwardly focused”. They tend to talk more among themselves about the pennies they are making / saving while tripping over the dollars that are right there to be had if they reached out to other analytical disciplines in the company or measurement community.

Many among us knew this was a danger from our BI experiences. If all you ever do is talk to each other about new shiny objects, your contribution to the business effort can suffer. BI struggles every day with this weight, the challenge of being labeled “really smart but irrelevant”. I don’t think we want this to happen to WA.

So with this backdrop, some of the conversations I heard at eMetrics Toronto about certain measurement practices were disturbing. For example, it seems very few people are measuring their customer contact efforts properly, and in time this lack of analytical rigor is going to damage the WA effort for all practitioners.

In the rest of the Marketing Measurement world outside Web Analytics, the fundamental measurement concept is not Response. Response is, in fact, practically irrelevant. This is because people appear to “respond” to campaigns when in fact the campaign was just a coincidence – these “responders” would have taken action anyway.

The rest of the Marketing Measurement world acknowledges this “coincidence effect” and uses the far more rigorous concept of Lift – the incremental response directly attributable to the campaign.

If you are one of those peeps who is constantly talking about “Pull Marketing” and the power of Interactivity and Social and all that, you cannot say in the same breath you think your Marketing campaigns actually generate all the response you are claiming. If there was ever a reason to think analytically in terms of Lift instead of Response, Interactive tops the damn list; you can’t have it both ways.

By using control groups – a random sample of people targeted for the campaign who are held back and do not receive the campaign. Lift is measured by looking at the incremental response of the targeted population over the control response.

In other words, if 2% of the control group ends up taking the desired action, and the “response” of the targeted group is 2.5%, the campaign is credited with driving a .5% response rate – not 2.5%.

While most easily used in contacting known populations, you can create control groups any number of ways. For example, mass media campaigns are often tested for Lift geographically, comparing “no media” markets to the markets where the media is running. This is how Marketing Mix models are built. Online, you could do the same with PPC, banners, just about anything.

My Point is this: The rest of the world measures Marketing success using control groups. In fact, the business world measures a lot of things using “variance to control” models. So if WA wants to be taken seriously, this practice of measuring Lift rather than Response has to come into play as a best practice.

You will be surprised how much more seriously your analytical cases will be taken when you use controls. Finance people in particular are very used to the concept of “variance analysis”. When you show Finance people two identical groups, one who received the campaign and the other who did not, and claim the influence of the campaign to be the “Lift” or difference between the two groups, Finance people just nod and say “I get that”. No challenge, self-evident.

This is a wonderful thing, you know, having Finance people really understand Marketing Measurement. It leads to much bigger budgets.

Now, will introducing (requiring if you are a manager?) control groups and Lift Measurement be popular with the people whose campaigns you analyze? Probably not to start, I’d guess. Because their results are going to be different. Could be much better than they ever dreamed. But could be worse.

And this is what leads to my 2nd point about the analytical culture: It is not the job of an analyst to be popular. It is not the job of an analyst to “support” an effort, Marketing or otherwise. The job of an analyst is to seek the truth – an ongoing process, with “better truths” exposed as you move forward.

Those of you who were around when we moved the WA industry from Hits to Visits know what I’m talking about. This was not a particularly fun or popular exercise, but it had to be done, because Visits were a “better truth”. People moving WA practices from log files to tags are often faced with a similar problem of explaining a better truth. It’s not easy, but it’s the right thing to do.

Why should you care about this Lift issue?

At some point, a boss, someone in Finance, or a BI person is going to require you prove the effectiveness of online marketing campaigns using a control group. And when using the control group tells you the campaigns are not nearly as effective as you thought they were, well, there’s going to be a little bit of a problem.

So, as I have done at many an eMetrics Summit, on this blog and elsewhere I strongly encourage you to start exploring the use of controls and the results they produce so you are ready for that day.

The Lift approach to measurement will change your mind about a lot of things you may now take for granted because they are part of the echo chamber you listen to all the time in the blogosphere. From folks who wouldn’t know a control group if it bit them on the arse, I might add. This is the problem with WA being so “inwardly focused” as a group – the crowd is often not as wise as you think they are.

Now, let me just add that as an analyst, it’s not your job to decide if a marketing program that loses money should be continued. There are perhaps several reasons this might be OK, e.g. “We lose money on that PPC campaign but it delivers Branding, so we’re fine with that”.

Whatever. The analytical truth is to know how much that “PPC Branding” really costs. So decisions can be made to (for example) buy the same Branding impact at a much lower cost.

Your job is to properly measure campaigns and deliver an accurate analysis so whoever makes these decisions has all the facts – good, bad, and ugly. Otherwise, it will be your fault the proper decision on how these programs should be executed was not made.

So, if you have to save this little exercise in the truth until your next job, I can buy that. But please, when you interview for or walk into that next job, express the appropriate level of surprise and dismay:

Avinash, I can only dream of getting as many “Great Post” comments as you do ;)

Mattress example is great, really do love that. There was a geo-control example from The Gap at #eMetricsTO using (offline) Bus Posters to drive traffic online and into stores – posters are here, but they are not here and here, compare traffic. Sweet little test. A bit messy statistically, but actual results were blowout. Better measured than not, I say.

Here’s my hope for web analysts on this control group topic – they resist the desire to focus on all the reasons why using control groups can be “messy” and concentrate on the simplicity of the general idea – compare “doing something” to “doing nothing”.

But Avinash / Jim, we can’t find statistically equal geo-markets, we can’t run a 2-tailed significance test on this, the markets don’t have defined “edges”, we can’t get the data we need in our WA tool, etc. etc.

Folks, Marketing can be a messy thing, please don’t use that reality as an excuse not to measure the impact of it. For me, if the results come in close between Test and Control, I assume failure of Test.

If Test blows out Control, that’s significant to me.

And then I repeat the Test to see if I can do the same again.

If I can get “sizable spread” on Test versus Control, and I can repeat these results, I have “Marketing Significance”.

Which is much better than simply believing your Campaigns are responsible for every response that comes in.

Wow, I totally feel like I’ve been chided by my boss. But in a way that makes me want to do a lot better. Thank you.
Funny, too, while reading your post I had the same arguments you mentioned later in your comment (“But Avinash / Jim, we can’t find statistically equal geo-markets, we can’t run a 2-tailed significance test on this […]”) because I’ve run into the same issues. So, how would you respond to my situation? Our advertising is hyper-local. i.e., we maybe have 2,000 visits total to a website a month for most our clients and, even more, the majority of that traffic is from the same geographic region (it’s not hyperbole to say it’s usually the same borough of NYC). Nevertheless, we are dealing with products that are very high-end and now I am challenged by your post. Any advice?
On a side note, for someone who does not have time to take a refresher course in statistics, any books, websites, etc., that you would recommend?

While it’s important to have the right data, clean data and all that, we can’t let these issues prevent us from discovering better truths. So what we do, if I can be permitted to use the term, is “hack” a solution. These hacks can be extremely creative, I’ve seen some wonderous stuff along these lines over time.

There’s not really enough info in your comment to propose a solution, but to generalize, in hyper-local situations I try to use zip codes or (even more hyper) street maps. Assuming you are selling something, do you have street address?

For example, pinpoint a location to stage a streeet promotion of some kind, say, simply a kiosk where you do product demo or simply display product and engage people passing by.

Look at net lift in customers for those streets versus the whole borough. In this case, the rest of the borough is “control” and “test” is the area, say a 1/2 mile radius, around where you do the street promo. You should see customer counts growing more rapidly (%) in the test area versus control.

This is a rough example, fine-tune for location and product. But it does give you an example of how powerful the test / control idea can be.

Here’s my particular situation: our firm is small and specializes strictly in luxury real estate, mostly on the east coast (we have accounts in Costa Rica, Atlanta, New Jersey, but the majority are in NYC).

So, by definition, we have a small amount of people who visit our websites. On top of that, the number of leads we generate is fairly smaller. So there’s the first problem: our data set (to me) seems teeny-tiny. Next, I would think our data is difficult to separate out into control groups. A lot of our advertising is hyper-local, such as banners on websites that only people in NYC would know about. And, I would argue, most the visits to the website are because of our marketing efforts. Finally, we only use Google Analytics on our sites and I think the most I can zoom in is by city.

But, as Avinash has taught me, there is a purpose to our website, and so (almost) every metric we look at in Analytics is measured against one goal: generating leads. And now, as you have instructed me, I need to figure out “lift” in our campaigns. I feel like I’ve struck gold with this idea… I’ve had financial guys try to calculate ROI in front of me at meetings and rail against us if they think their investment is not getting their money’s worth. Next, our clients are pulling back dollars and it would be wonderful to have hard data to say Publication A and Publication B are *proven* to “lift” leads so they don’t feel like a guinea pig.

So… a control group. I would love any thoughts on the situation. Thanks!

Thanks for supplying the additional info. By saying “Publication A and Publication B are *proven* to “lift” leads” I take it you advertise on other sites to attract traffic, then attempt to convert these visitors into leads.

Control = no advertising. Sometimes in a micro market like this, the only way to create control is to stop the advertising, then compare results when ads are running versus not running. You could do this one publication at a time, if desired.

Critics will cry this is not a true control, since different time periods are used and “something” migh happen in one period that did not happen in the other to affect results. True, from a statistical perspective.

But clearly, the analytics Ninja could monitor traffic and look for the presence of outlier events, eventually declaring the two time periods to be equal *except* for the change in advertising. If outlier events did occur, the traffic from these events could be excluded from the analysis.

While “on/off” testing like this is not perfect from a stats perspective, it’s certainly better than not testing at all, and if the results can be repeated 3 times, you likely have “Marketing Significance”.

If you were testing a new drug for efficacy or determining defect rates on parts for military aircraft, I would agree the above method for creating control is not statistically valid. But we’re talking about Marketing here, and the search for better truths using limited resources.

Thank you, I think that was the lightbulb I needed. I specifically remember one time we paused PPC advertising for a month for a smaller client and there was a noticeable decline the in leads generated.

If you repeatedly get the same result when you take out the advertising, it’s difficult to dispute there is a hard linkage. You may never know how to specifically describe that linkage between the ads and goal behavior, but you don’t have to understand the linkage to determine it exists.

This curiosity about “linkage” is in fact the basis of much scientific experimentation – I can see the linkage, but why does it exist?

For example, I see that when flies are on meat, people tend to get sick when they eat the meat.

Proving why these linkages exist are the basic work of the scientific method. Fortunately for most Marketing applications, proving the specific mechanics of these linkages is not nearly as important as proving they exist in the first place, determining with reasonable clarity there is cause and effect.

I can cover the meat and get real benefits without knowing for a century why, specifically, covering the meat prevents disease.

Many #wa folks would often be better off just going ahead and “covering the meat” rather than letting the flies gather while they figure out the precise mechanism that leads to disease.

We’re big fans of hack analysis when clean tests aren’t possible. I may have to quote you on the notion that just because we can’t see every causal connection doesn’t mean we shouldn’t try. Too many marketers simply throw up their hands and spend more money.