Everybody says they’re doing it …

Just like sex in high-school, split-testing is all the rage.

Everyone likes to pretend they’re an expert. Buzzwords and rumors abound … stories about increasing conversion rates by an order of magnitude by changing the color of a checkout button (but nobody shares the magic color!).

Most importantly, nobody wants to admit that they don’t really know what they’re doing, or (gasp!) have never done it themselves. Many join the conversation without wanting to let on that they don’t even know what split testing is!

Then you measure how many people take the desired action (like buying a product) on each page, to see which variation works better.

Now that we’ve explained it, let’s be honest.

You don’t split-test, do you? Maybe you did something once — a small, unsatisfying and inconclusive experiment, but you’re not testing on a regular basis … right?

Most people don’t want to admit this, because they feel like they’re the only ones not doing it. Everybody knows that split-testing is absolutely critical to effective marketing online — so who wants to admit that they’re the only ones who aren’t doing it?

Well, relax.

It turns out that “most people” can’t be the “only one” — funny thing, right?

Hardly anyone is really doing it…

The truth is that many of the exploits that you hear about are fueled by a vivid imagination, rather than experience; only a very small proportion of the talkers are actually doing the things that they describe.

And that’s okay … maybe you aren’t ready.

To do split testing right, you don’t just need to test different variations of a page, you need to measure results, and the differences between the results generated by each variation.

This is challenging, and often impossible for websites that are just starting out and don’t have much traffic.

Let’s explain why with a short example:

Variation 1: One page page received 974 visits, and 5 people convertedVariation 2: The modified version of the page received 961 visits, and 7 people converted

You’d think that Variation 2 is the clear winner, right?

Wrong.

Crunching the numbers, we find that there is only a 45.27% chance that over time, Variation 2 will continue to outperform Variation 1.

In other words, there’s a 54.73% chance that the difference between their success rates was the result of random chance.

Okay … where did I get these numbers?

Split testing is all about finding results that you can be confident in based on statistical significance. This isn’t a touchy-feely kind of confidence — it’s calculated mathematically, and you want it to be at least 90%, and ideally 95% or more to choose a winner.

You don’t have to worry about calculating the numbers yourself; there are free tools out there that can calculate the statistical significance of your results for you (you just plug in the number of impressions and actions for each variation, and the rest is done for you), and split-testing tools like Google Website Optimizer will do the calculation for you as well (and plugs right into Premise).

If you don’t want to calculate the actual significance of your test, here’s a rule of thumb that you can use (borrowed from Tim Ash’s book Landing Page Optimization):

If there are 100 impressions in your sample, you need to see a 20% difference between variations to be sure that they actually mean something.

If there are 1,000 impressions, you need a 6.3% difference.

If there are 10,000 impressions, you need a 2% difference.

If there are 100,000 impressions, you need a 0.063% difference.

Do you notice the trend here?

To detect small differences in improvements (which are what most split-tests are likely to reveal), you need a pretty large sample size.

The moral of the story is that if you don’t have much traffic, then maybe you need a solid growth strategy instead of better split-testing.

But what if you do have the traffic?

After all, most sites and blogs have at least a bit of traffic, which is enough to test the more important things, like headlines and opt-in placement.

Most aren’t doing it very well …

Like sex in high school, split-testing is something at which even those who are doing it don’t have much experience, and their actions are often controlled by impulses and urges, rather than skilled intent.

Let’s take a quiz, and see if you’re making any of the mistakes of most would-be split-testers:

Do you test one thing at a time? Most wannabe split-testers don’t; they change half-a-dozen things at a time, based on the latest and greatest ideas to have entered their minds. The trouble with this is that when things work (or don’t work), you don’t know which changes are responsible. To effectively split test, you need to isolate variables, which means testing one thing at a time!

(Okay, yes, it is technically possible to test multiple things at once — it’s called multivariate testing. In practice, though, doing it requires huge traffic numbers, and a much more complex setup — if you’re not already doing it, then it’s probably not for you.)

Are you measuring results? I mean actually measuring, with numbers? This is also a rarity — more often, it’s an anecdotal “I feel like we’re getting more sign-ups” kind of ‘measurement’. Be careful with this, because as humans beings we all suffer from a confirmation bias, which means that we’re much more likely to favor evidence that supports what we want to believe. Measuring with actual numbers is critical to effective split testing!

Do you let your experiments run until you’ve reached a 95% confidence level? This is where the greatest number of mistakes are made; an experiment is setup and allowed to run, until the experimenter feels that “this one is working better”. This occurs before reaching the point at which the numbers actually prove what you’re trying to prove, which means that the results are really inconclusive, and can’t be trusted. And what’s the point of doing experiment after experiment if none of the results can be trusted? You absolutely have to let experiments run until you reach a statistical confidence in the results!

Are you tracking your experiments? Rather than flittering from experiment to experiment, keep a journal that documents each experiment, and the lessons that you learned from them. This will prevent you from running repeated experiments that test more or less the same thing, without ever learning your lessons. Setup your experiments as hypothesis tests — each experiment is meant to test a guess about something that you think will influence your audience!

Do you focus your experiments on your conversion goals? There’s no point in experimenting just for the sake of experimenting, and yet it’s more common than you might believe. There’s no point testing something unless you think it will contribute to the conversion goals that you have for your site. So rather than setting up test after test, consider first what your objectives are, and what you might be able to test that will contribute to reaching that objective!

You’ve probably answered “no” to at least some of these questions, but that’s fine — the important thing is to learn and adjust your practices, so that the experiments that you run tomorrow will be more effective and fruitful than the experiments that you ran yesterday.

Now that you’ve got the processes worked out, let’s talk about some of the things that you might want to experiment with.

Do you feel like experimenting?

Experimentation can be great, but if you’re a professional blogger or business owner, you’re not just in it for the fun — you need to focus on the experimentation that will be most gratifying to your bottom line.

Here are some of the most important things that you should be sure to split-test:

The headline. This is the single most important thing that you can split-test, because the headline is the first “gateway” that your readers have to pass through. You will lose more people at the headline than anywhere else on the page, so test the headline first.

Opt-in placement, text, and colors. Try different placements of the opt-in box on your site, different calls to action, and different box and button colors. Since you probably get more sign-ups than sales, this is a much better place to start your testing.

The order button text and colors. Experiment with changing the text of the order button (options include “Get It Now”, “I Want Access”, “Buy Now”, “Add to Cart”, “Proceed to Checkout”, and more), and with the color of the buttons (yellow, red, blue and green are good places to start). This applies to your email opt-in box as well.

The format of the offer. This is a little more work to test, but if you have the option to do it, you might find that a lot more customers are willing to buy one format than another. Experiment with your offer as an ebook, report, video series, podcast training program, infographic and so forth.

The price. This isn’t always possible to test, but if it is, you might find that you’re leaving a lot of money on the table; it’s possible that increasing the price will not affect sales, and it’s even possible that increasing the price will increase sales as well!

The style of the introduction. After the headline, the first thing that your audience will read is the opening paragraph. Experiment with different styles — try making bold statements, vs. telling a story about their problem, vs. describing the ideal outcome. See what works best for your audience.

The product imagery. Try different versions of your product picture — you’d be surprised how much of an effect this sort of thing can have.

Trust seal choice and placement. Different audiences will respond to different trust seals, and will want to see them in different places. Good places to test are near the description of your guarantee, and of course near your order button.

Email subject line. This is just as important as the headline of your sales page, particularly if you’re using confirmed opt-in, in which abandon rates of 20-30% are common. Split-test the email subject line of your email confirmation messages to make sure that as many subscribers as possible actually get on your list.

There are lots of other things that you could test — for more ideas than you’ll ever be able to test, check out the Landing Page tutorials here on Copyblogger.

Getting started with split testing…

If this is the first time you’re hearing about split-testing, then your head is probably spinning right now.

That’s okay — it’s a lot of information to take in.

Even if you’ve been thinking about split testing for a while (and have even tried a few experiments), you might be wondering about one thing: how to actually get the experiments going.

That’s where Premise comes in — it’s a drop-dead simple and complete landing page package that plugs right into WordPress, and you can use to:

So enough fence-sitting … if you want to get serious about split-testing, go get Premise and get started!

Okay, over to you …

Have you experimented with split-testing? What has your experience been? Where did you get stuck?

Do you have a Premise success story to share?

About the Author: Danny Iny is an author, strategist, serial entrepreneur, and proud co-founder of Firepole Marketing, the definitive marketing training program for small businesses, entrepreneurs, and non-marketers. Visit his site today to download a free split test checker, or follow him on Twitter @DannyIny.

Take The Conversation Further ...

Comments

I just want to point out one thing. Yes, to have accurate results you need to test one variable at a time, but don’t hesitate to test a completely different page. Sometimes you’re just working with a bad sales or conversion page and you can tweak all the colors and headlines you want and you’ll never get insane results.

How would you know you you’re working with a bad sales or conversion page? Is there a benchmark you should use – and if your way below the benchmark you should try changing the complete page – is it a ‘gut’ thing, or scientific?

Hey Paul, it’s hard to know, because norms are so specific to industries and offers. Ideally, you can work off of comparables, but realistically speaking, you almost never have access to representative data… the best thing you can do is test and test and test…

Absolutely right, Derek. Single-testing will get you to a local maximum (“local peak”), you’ll need multi-variate if you’re aiming for the global maximum (“chooing the peak”). Good to start with the multi and move to single as you gain more confidence that you’re scaling the right peak.

Also, Danny, users will want to control for differences in traffic characteristics. Most glaringly, date/time and traffic sources. If you don’t control they can get spurious correlation.

As I read through that I felt like it was written specifically for me. I have made all of those errors . . . in fact I still am! I tend to come up with ideas and then change a ton of stuff and then leave it a month to see what happens. Whether the site made more or less money is the main metric I tend to go by.

However, your article has hit me like a kick in the nuts! Don’t just talk the talk, walk the walk. How can I pass on killer strategies to my list if I still haven’t mastered this KEY marketing fundamental.

You bet, Jimmy, I’m glad it was helpful. The key is to test one variable at a time; that means only changing one thing at a time, but it also means controlling for other changes – for example, if one month you ran three guest posts and the other you didn’t run any, then you’ve got to account for that in your test results!

Most don’t split test because of laziness. They are in this game because they want to do the least amount of work for the most profit. What they don’t realize is this industry is a full time business. Not a hobby. To make real money means you need to put in real effort and time. That’s exactly what the wealthiest online marketers understood.

Wow, that is a really useful post!! The change per impressions percentages was really useful. I have done split testing before and testing something over a month. When the number of sales went up (even by one) I would sit back and congratulate myself on being a killer marketer. However, invariably the next month, after not touching it, the number of sales would go down and I would be back to square one. Now I will only pat myself on the back when the change in action is significant enough to warrent it.

Hey Pete, I’m glad you liked it! Yeah, the statistical significance portion is almost always overlooked – most people don’t realize that they have to measure it, and a lot of people who do realize don’t feel comfortable doing the math (they don’t realize that you don’t have to – there are tools that can do it for you).

Another thing people forget to check is effectiveness. A may have fewer click throughs than B, but maybe A has a better conversion rate…and that’s what we’re really interested in, right? Fewer tire kickers, more buyers.

That’s a very important point, Lara, and it comes down to what outcome you are measuring – clicks isn’t always the best one (in fact, with PPC, it often isn’t – clicks is more a measure of how much you’re paying than how much you’re making!).

The simple fact is, most lists aren’t large enough to effectively split test. Unless you have a really large list, and I’m talking six figures minimum, then you really don’t have enough to split test.

Say you have a list of 30-thousand. First you test the headline 15,000to one list — 15,000 to the other. Now you aren’t sure about the lead graph,better test it.But you don’t want to test the same list. So you take `15,000 and split it in half. 7500 to one list — 7500 to the other. Okay, but what about that graphic? Wouldn’t a Clickbank clip look better than that monitor with the money flying out of it. Okay 3750/3750. And so forth. It isn’t long before your samples become so small they are insignificant.

I can see the value of split testing. I can also see how worthless it becomes if its overused.

You’re right, Joelin, that the numbers need to be large, but I think you’re wrong about how large the numbers have to be… what you’re describing is multi-variate testing, in which you’re testing a lot of things at the same time – for that, you need really huge numbers.

If you’re just testing one thing at a time, though, and changing to a new experiment once you’ve reached statistical significance, then you don’t need nearly as large a sample size to do the tests (see the post above for specific numbers, or the tools that were linked to).

Love the headline too… Thanks for the great breakdown of split testing and the fond memories of high school shenanigans.

I’ve also tested pages that have a video intro or not and always find that most often video gets better results. Back in the day, I’d put an audio greeting on sales pages and that had the same warmth effect. I haven’t tried Premise, but one thing I’m personally clicking away from are too many template look alike sales pages. I’ll have to check that out to see if it gives me the click-away urge or the stick-around urge.

Yup, those are common reasons… but I think overwhelm is the most common one – the unfortunate reality is that split-testing is just complicated enough to be intimidating, which is why tools like Premise are so important – especially if the math makes you squirm…

I find when I get something that works I get stuck in a comfort zone I do not want to change anything it is only when something is not working that I test to improve it. I should aim to split test more often but it is hard to actual get down to it.

That’s a really common challenge, Phil! I find the best way to go about it is to plan a bunch of things to test in advance, and that way each time an experiment is done, you pick the winner and continue on to the next test – all you need is the implementation, which is really quick.

I am running my first serious split testing on two Facebook ads. Exactly the same market criteria; both ads point to the same website page for a women’s event.
The second ad initially had a much higher click through rate, on a lower display rate – but has now levelled out to be the same numbers as the first one. With no sales, we have now cancelled the campaign.

Hey Lesley, if click-throughs started off different, and now they’ve become the same, it is possible that you’ve saturated the audience with the ads, so the first one isn’t “pulling” as well anymore – are you running your ad to a small, targeted group of people?

One thing, though – if the click-throughs vary depending on the ads, and the conversions are remaining flat, it usually means that the problem is with the sales page or offer, rather than the ad.

Thanks, Steven! Yeah, that’s the thing, most people skip over the statistical significance portion – I think it’s because they’re intimidated by the math. But without it, split-testing doesn’t really mean anything…

Wow, thanks so much for this article. I will be honest that I cringe every time I see articles in praise of split testing – what IS that and how do you DO it?! Anyway, looks like I’ve kinda been doing it and didn’t even know it. My blog isn’t quite a year old yet, but I’ve continually tweaked the opt-in look and location, tag lines, colors, etc., and get such a kick out of finding things that work. I’m a very visual person anyway so I actually enjoy coming up with new things to test. At least I understand what split testing is now. Thanks again!

I have to say, I have never ever done split testing on the internet. I have heard the stories. Not to say that I don’t experiment with design changes… all the time… but market testing the changes would probably be very useful, and dull, tedious, and all those things us creative types don’t really want. It would also be most useful to test over a week on a website that gets a lot of traffic rather than one that gets almost no traffic.

I really enjoyed the article btw. It really highlights the kind of marketing (research) you can do on the internet.

Hey Brian, thanks for the kind words, I appreciate it! You’re right – if traffic is minimal, then you’re wasting your time split testing, because you aren’t going to get results that are meaningful anyway – your efforts should go to traffic first. But until then you shouldn’t really focus on design changes either, right? 😉

I definitely engage in testing, but not anything too high -level. Mostly because too much “data” makes my brain freeze. That said, I can vouch for the fact that testing can make all the difference in knowing what your target market really wants. If you don’t have anything to assess, you constantly throwing darts blindly. And in all likelihood you can be hurting because of it.

Hey Nivin, honestly, unless your traffic is in the stratospheric numbers, multi-variate testing isn’t practical. Maybe Brian can weigh in, but does even Copyblogger have the traffic to effectively do multi-variate testing?

I think you’ve really got to be on the scale of an AOL or Yahoo or Google to be doing that stuff – the rest of us can stick to A/B testing. And Premise is a great tool for that!

The point is well made. Don’t believe the numbers unless you know what you are telling you. Monitoring is the greatest gift the Internet has to offer a business. But it is a complex area and whilst numbers may not lie it is crucial to ake sure that we understand what they are telling us.

I love what you say about how everyone is talking about it but no one is really doing it. I feel this way about a lot of things. A great idea comes along and everyone whines that it will be ruined because everyone will jump up and down and copy said idea. Thing is, unless you’re a daily deal site, this rarely happens.

Why? Because people are lazy.

Figuring out how to do a good split test takes time to learn and, yes, it’s frustrating. But take a day and devote it to testing colors, fonts, the first sentence of your intro, whatever. If you actually make a commitment and do it you’ll be way ahead of the game.

Love this! I’m always trying to encourage people to split test – or at the very least sequential test where you run one page and then another and then compare results. Maybe for some of us that’s an easier entry?

Any testing is better than no testing, but I think simultaneous is a lot better, because it controls for variables like different posts, different promotions, and so forth – testing is only useful when the data is valid, after all.

The idea is that you create two alternatives of something (e.g. the form at the top right of this page) One called A that looks as it is now. Another called B that uses the word “subscribe” instead of “join us”.
(hence split testing is also called A/B testing.

Then you let both versions run for some time. (see the original blog entry for details of what this “some time” might need to be) and then you figure out which version generated more new sign-ups and if the difference is significant enough. Based on this you either keep version A or version B.

I agree with everything but one thing: it’s easy to say that you should let a test run until you have 95% confidence, but some tests just never get there. I’ve seen sites (and owned/own them) where 50k impressions still did not amount to a high enough confidence level.

And that’s a big problem. A huge problem. A problem that’s not often addresses: tests with minor differences. Don’t test headline A versus B, in my experience even that is too small of a difference in many occassions. At first, test radical differences: long copy versus short copy, video + opt in form versus text + opt in form, then test smaller differences.

I’ve set a 3 month max to a test at a certain point in time because too many tests never reached 95% confidence. If there are still no significant differences on a site with plenty of traffic, then there is NO winner. The different version needs to become even more different. Can’t stress this enough,

That’s a really good point, Dennis – thank you for pointing it out. If the difference is too small, then even a large number of impressions won’t easily get you to statistical significance – which is why it is important to test things that are likely to make a big difference, just as you said.

I really enjoyed this post Danny. As a rule of thumb (I realize this would change depending on the industry), would you recommend changing the format every month? Do you often find yourself going back to the original format after you have changed it?

Great question, Brendan! It isn’t a function of time, it’s a function of exposure to enough people to get statistically meaningful results. So it will depend on how much traffic you get. Does that make sense?

Hi Marc, I’m glad you liked the post. It’s not a question of how many visitors, it’s a question of statistical significance; the more extreme the difference in results, the less visitors you need. Check out our split test checker (linked to in the post), to see how the numbers work out.

Great post Danny, You brought up very awesome points and I agree with most of them. I think people don’t do split testing enough because of the time it takes to setup and execute some of these systems. Most people just want to do small things..like split test a singe image.

This is why I created ClickAppeal. I wanted something that would allow the average blogger or Internet marketer the ability to split test images on their site with out having to split test an entire page. Something you could setup within 2 min. http://ClickAppealapp.com