Dealing with Quantitative Dissonance and Other Disciplines of a Successful A/B and Multivariate Testing Program

Menu

7 Deadly Sins of Testing – No Single Success Metric

In my nearly 9 years working in testing and data, I have worked with or evaluated close to 300 different sites and their testing programs. While I wish that I had all these great stories of amazing programs left and right and amazing results that they are all receiving, the sad truth is that there is no perfect program and there are very few you would even want to take something from as a starting point. The reasons that programs get into this stage are legion, but there are 7 common “sins” that destroy programs. I want to go through each of these 7 deadly sins and look at how they manifest and how to fight them. What you will find is that all of these sins come from the same place, a lack of understanding, either willful or not, of how to think about testing or what the difference is between being the hero or the villain. What you choose to do about these sins is up to you, as there can be no greater retribution then evaluating your own actions, finding your own weakness, and then turning that into a strength.

The first of these sins, and by far the most evil and damaging, is failure to align your program on a single success metric. So many programs fail because they optimize to their KPIs, or to the concept of the test, or even worse, to what the group running the test is measured on. They optimize to improve their concepts, not to improve the site. What makes this sin especially dangerous is that it will make it look like you are greatly successful, as you will get a return, and because the thing you are tracking is not site wide revenue metric, you will often finds the magnitude of change is dramatically higher.

The reason this is a sin is that you are mistaking the concept or the area for the end result. You are ignoring the unintentional consequences of the test to focus on what you want to find out, not what you need to find out. You are assuming that the world only works exactly how you think it does and you are abusing the data to prove your point. A classic example of this is testing to improve “bounce rate” or clicks. In both cases, you are mistakenly thinking in a linear fashion, assuming that the rate of action is the same as the value of the action. Only in cases where the rate is the same as value would you see a tie together, but you will not know the value unless you look at the global impact. To put more simply, if the reduction of bounce rate or the increase of clicks matter, it will impact the bottom line. If it does not, you will see that in the bottom line as well. In both cases, the intermediary action, the bounce or the click, is simply a means to an ends but we are forgetting this in order to make the concept easier for us to understand. In all cases, you can see a massive increase, but because it is not tied to the end result, you have no clue if it is helping or hurting your site make additional revenue or be more efficient.

In way too many cases, when you go through and evaluate global impact after the fact, you find that the increase that you are shooting for comes at the cost of higher value actions, meaning that improving your click through rate to your section is costing your site total revenue. When you don’t do the work to find out, then you will continue to waste money and decrease performance, while at the same time have many great impressive large lifts to talk about to your boss.

What is especially frustrating with this sin is that there are many different groups and “experts” out there that are more than happy to propagate the myth or to abuse it to make themselves look good. Agencies are especially notorious for this behavior. They let you pick a sub metric and optimize to that, which has the double advantage of feeding your ego and avoiding dealing with the core issues that will define your success. Even worse, they will talk you into or let you pick multiple metrics, and if the first one doesn’t show how amazing they are, they will find one deeper in to show how big an impact they had for you. Look at any of the hundreds of posted “success stories” that flood the market, and look at how many of them are based on improving a metric that has nothing to do with site success. This is their fail safe to make you feel better about your program while simultaneously sucking more money out of your pocket.

For many groups, figuring out what the purpose of their site is or what defines success site wide (almost always revenue) is a difficult and time consuming task. It is also the single greatest maker of if you will receive any value from your test. I refuse to work with a group unless they have figured out what they are trying to do for the entire site and then will only run a test if they agree to make decisions only off the impact to that bottom line. The results then can tell you so much. If you find that you are not impacting it much, that means that you are only doing better testing and only testing what you want. If you find that promoting item X increase revenue for that item or group, but the site loses money, you should re-evaluate your priorities for merchandising. If you find that getting more people to your cart doesn’t increase revenue, then you are not optimizing to value. In all cases, the actual reason why things are happening is almost completely irrelevant, but the value derived is from acting on a meaningful way only on site wide goals.

Look at what you are doing and see if you are committing this sin? Are you tracking different success metrics for different tests? Do you look at dependent metrics, such as a limited product set or only conversions from people who click on something? Are you looking at metrics that have no tie to any site wide success like bounce rate or clicks? If you are doing any of those, then you are committing the greatest sin of testing. You are wasting your time and energy to sub optimize and are assuring that you can never know the real impact of your tests.

Finding your single success metric can be difficult and can cause a lot of headaches getting buy-in, but unless you are willing to do the hard work, then what is the purpose of your program. You have no chance of finding real value, and the best you can do is make something think you are having a much greater impact then you really are. There are many bricks on the path into and out of the darkness, it is up to you which direction you are traveling on them.