App Store Page A/B Testing: How Not to Flink Up Your Test

December 14, 2015

8 min read

These days you would rarely find an app developer or an app marketing manager that hasn’t heard of or tried app store page A/B testing. Everybody seems to understand the importance of it. The app listing page is your storefront, and you want to make sure it is as attractive to potential buyers as possible.

The concept of comparing two variants that receive equal amount of views with further conversion rates may appear to be quite simple. However, A/B testing is not as easy as you may think. When you decide to run a test and start planning it out, questions start to arise. What should I test first? How do I come up with the variants to be tested? How many visitors should I drive to the page? When do I know if the results are trustworthy?

Over the last 1.5 years, our team has been closely working on App Store page A/B testing. I’ve seen multiple indie developers and giant game publishers incorrectly A/B test their app store page because they don’t follow several important rules. By creating and analyzing experiments for over 1.5 years, we’ve learned a lot about running proper A/B tests for your app page. What we learned is what I’d like to share with you today.

Sample size

Any book on math statistics will tell you that the bigger sample size the better. The more people that participate in the experiment, the more accurate the results will be. However, when it comes to app store page testing, there’s always a balance between your budget and the accuracy of the experiment. You want your results to be statistically significant, but you don’t want your budget to go through the roof.

So the question is, what is the optimal sample size that should be used for the experiment? According to our calculations and practical experience, 1500 visits per variant of good quality traffic is sufficient in the vast majority of cases. (Good quality traffic can come from Facebook or Instagram ads, which give you conversions of at least 5-7%.) This means that a standard split test with two variations will require at least 3000 visits on the experiment page. If you run an experiment on a smaller sample size, you risk collecting results that won’t confidently tell you which variation is better.

When and how long to run an app store page a/b test for

You have to keep in mind that users that you are running your experiment against, believe that your experiment page is an actual app store page. As such, the experiment timing should be appropriate. Think carefully about what your users are expecting to see on the app store page. For instance, if you’re gearing up for a Christmas update, you want to make sure you run your Christmas icon or screenshot experiment around November/December. Don’t run it during Halloween or another holiday. On the other hand, when you are experimenting on new approaches in screenshot messaging for game apps, those types of experiments can be run all year round.

When it comes to the duration of how long to run the test for, we recommend 7-14 days. (Never shorter than 7 days!). In the majority of cases, installation behavior of users differs during weekdays and weekends, so you want to make sure you capture complete data.

Traffic sources

You want to make sure you are running your experiment on the same audience as you normally have (or planning to have after the launch) in the app store. That’s why precise targeting is key. We’ve found that the easiest option for high quality traffic is through Facebook or Instagram ads, even though they are not the cheapest. Majority of app developers are already using Facebook ads as their major paid acquisition channel. It is easy for them to reuse the same targeting for the experiments as well, or they can play with new targeted settings.

It is also worth mentioning that more advanced users sometimes prefer using multiple traffic sources. They do this to closely simulate the traffic they normally get on the app store page. In this case, simulating organic traffic gets a bit tricky. Organic traffic users are coming from app store search. They (1) are intentionally looking for what they want (vs. them randomly seeing a banner while scrolling through their timeline), and (2) using specific keywords to find it. Facebook users have neither one of those. You may find similar people searching for your app in the store search, but they may not be in the same searching condition when you reach them with a Facebook banner.

While this approach of advanced users makes sense, chances are most of the apps won’t need this level of accuracy. Our advice is to use this approach only if you are planning to spend a significant amount of money on user acquisition. Every little change in conversion rates makes a difference. In all other cases, good quality traffic with good targeting options would do the job just right.

Make significant differences in variations

One of the most popular mistakes new A/B testers are making is testing out very small changes. These small changes will end up giving you approximately the same results without an obvious winner. As such, time and money would be wasted.

When creating variations to test (generating hypotheses), remember to test significant differences. For icons that means using different characters, color and/or styles. For screenshots, you may want to test various approaches to delivering the message about the app. These may include (but not limited to), describing the app’s benefits or showcasing its features, or showing one smartphone screen per screenshot versus none at all.

Number of variations (understand false-positives)

When planning an experiment, impatient marketers may skip the standard 2-3 variations testing and instead test 7-10 variations in a single experiment. This may look like a good idea at first. All your variations may end up having significant differences with enough traffic to gain statistically significant results in the end. However, you’ll end up with false positives.

False positives usually happen when you test a high number of variations. The more tests you run against each other, the higher the false positive becomes. (i.e. Testing 10 variations at the same time can result in a 95% confidence level, but there’s a 40% chance of a false positive. That means there’s a 40% chance the test result is false. However, testing 2 variations that result in a 95% confidence will only have a 5% chance of a false positive.)

Put it simply, if you test multiple variations at once, then the chance of the winning result being false increases. As such, it’s best to A/B test 2 variations to get the best results.Here is a great video that explains it (false positive description starts at 21:10):

What to test first?

That’s the most popular question from people that are just starting off with app store page A/B testing. It always helps to start testing with parts that influence your conversions the most. Those parts are the graphical assets of your page, including app icon, screenshots, preview video, and the feature image (for Google Play only).

After you’re comfortable running the graphical-type experiments, you may want to try text-based experiments. These include name, description, in-app purchases etc. All of these assets tend to influence the conversions less than graphical elements. Nonetheless, depending on your app audience, they may also change the conversion rate.

Another thing to keep in mind while testing, is the app name and description. Both are scanned by app stores for keywords. Search discoverability and attractiveness to your users are two components of name and description optimization. Think of it this way, you may come up with a better name or description that your users prefer when they land on your app page. Still, when you take keyword search into consideration, the words that you choose for your name or description may be highly competitive. As a result, you may end up losing your organic traffic volume.

Setting realistic expectations

Like any other mean of optimization, app listing page A/B testing is not a panacea, it is a tool. Split testing most likely won’t skyrocket your app store page conversions. However, when used correctly and systematically, it can eventually give you significant results. It will also help you understand your audience more, making sure you communicate the benefits of your application in the app store in a clear and captivating manner.

Start testing now

With TestNest, we took everything that we know about app store testing and embodied it in an app listing page A/B testing platform. Now that you know the most important basics of split testing, creating and launching an experiment would take you less than 5 minutes. Sign up for an account and start testing now!