How to Check the Significance of an A/B Split Test

So youâ€™re doing some split testing right? Iâ€™m sure you are (or should be! I mean it!). Anyway, are you sure that the results youâ€™re having are significant, statistically significant?

Hereâ€™s what I mean

Suppose you have two different banners. No matter if youâ€™re a web designer or an internet marketer, either way, you want to know which one is better. The easiest thing to do is to perform some A/B split testing.

After the test is over you find out that 600 impressions of banner A resulted in 50 clicks. However, 600 impressions of banner B resulted in 70 clicks. Great! So banner B is the better one you may think. The only problem is that these results are not statistically significant. This means that the results should not really be considered when deciding which banner is the better one. It seems like banner B is better but the number of impressions and clicks is so small that the results might be completely accidental. How to find out whether or not the results after a split test are significant?

Thereâ€™s a solution

You just have to get yourself a cool, easy to use mathematical tool.

Introducing the statistical significance checker:

Itâ€™s really easy to use. Just input the number of trials and the number of actions for both variants of your A/B split test and press the â€œcalculateâ€ button.

This tool will tell you whether or not your results are significant.

Hereâ€™s an example:

And another, significant one:

As you can see you will also get a level of certainty associated with your test. (A hint: everything above 95% is a great result.)

Now the best part. The tool is free. Just use the link below to download it. Have fun!

Actually, quite recently I made the tool available online. Here it is:LINK.

Download Statistical Significance Checker Here

(Quick note. An internet connection is required for this tool to work.
The tool runs on Windows and needs a thingy called .NET Framework.)

Here’s a list of articles you may also enjoy:

A/B testing – Wikipedia, the free encyclopedia – A/B testing or bucket testing is a method of marketing testing by which a baseline control sample is compared to a variety of single-variable test samples in order to improve response rates…

The Ultimate Guide To A/B Testing – /B testing isnâ€™t a buzz term. A lot of savvy marketers and designs are using it right now to gain insight into visitor behavior and to increase conversion rate…

I thought most internet based testing tools had significance build in. But I guess this tool is handy for things like A/B testing in PHP etc.

Karol K.

Yes, many of them have this kind of functionality but sometimes you just have to do some split testing on your own.

For example, when you’re running an AdWords campaign with 2 ads set to display evenly. Or when you notice that your 2 main traffic sources generate slightly different conversions – it’s worth to know whether or not this difference is mathematically significant.

http://www.reedge.com Dennis

Hi Karol, your right. I can see the use of the tool. Why did you decode to make it downloadable and not just online?

The tool was originally developed by me and my team for our own use. Basically it’s not a complicated solution. However, I wanted to be sure that I have a tool that implements proper mathematical calculations so that’s why it was created. There was an online version too but we figured that this kind of stand-alone version would be easier to use for us.

I’ve watched the video at reedge.com and the tool looks quite cool. I will have to give it a closer look in the near future.

http://www.reedge.com Dennis van der Heijden

I think it would be handy in reedge to have some sort of prediction tool. Based on historical data to guess how long it takes before the test is finished or see how long a test still has to run.

I guess that would be handy for people to know!

Try the reedge.com test on the homepage (you can fill a website url (any) and you see a bit how it work (multvariate test is best to lok at)

Karol K.

Prediction tool is actually the next feature that will be implemented in my significance checker. It can be done with some PHP code. And you are right – the information about how many additional trials/impressions/actions you need to have before a test becomes significant can be really valuable.

Feel free to contact me if you’d like to have a look at the algorithm of this feature once we get it done.

http://www.reedge.com Dennis van der Heijden

Hi Karol,

Thanx for the invite. I think its important for the impatient clients. They might stop the test to soon, but if they know they (only) have to wait 2 more days… they might wait for that significant result.

I have seen +450% on 60% significance and -5% on 95% significance for the same variable. SO its very very important to wait.

Karol K.

That’s pretty much the case with split testing.

However, it’s worth to remember that even with a 95% level of certainty (statistically speaking) you will still be wrong every 1 in 20 decisions. … Nothing’s perfect