Split testing, or “A/B testing,” is a popular marketing tactic for optimizing collateral and improving conversions. Instead of rolling out testable variations of products that are much more time-intensive and costly than landing pages, product managers simply need to be lean like marketers and split test prototypes. And this vehicle for generating feedback is the critical difference.

We’ve developed a capability at Alpha UX to perform split testing that efficiently generates quantitative data and actionable user insight, even in pre-development (read: before coding). Let me walk you through an example of how we’ve been able to answer some of the most sensitive questions in healthcare using prototype split testing.

The Product Manager’s Compass

Ahead of a major healthcare conference in New York, we invited leaders in healthcare to discuss challenges and trends in the industry.

We showed them our whitepaper, which was the culmination of weeks of market research surveys. One of the most intriguing data points illustrated that only 20% of respondents have used an app made by their health insurance provider, but more than 80% said they would probably or definitely try such an app. In fact, of the respondents who used apps made by their insurance providers, more than 50% were actually satisfied with the digital offering.

The healthcare industry leaders were in near disbelief. Consumers supposedly harbor deep mistrust for their healthcare insurance providers and would never use apps provided by them, even if it meant foregoing features and benefits insurers are uniquely capable of providing. It’s a point we’d heard many times before but weren’t about to counter without data. We could easily turn to split testing for a definitive answer.

The Split Testing Process

To quantifiably determine whether a health insurance provider is capable of providing a trusted app, we set up an experiment similar to hundreds we’d run before.

We designed multiple variations of an initial splash screen for different brands. We created one variation with a health industry startup, one with a well-known healthcare product maker, one with a popular medical information aggregator, and another with a major health insurance provider. We then took the most highly ranked desired app features from our market research surveys and mocked up a first-time user experience to represent them in three sequential screens. They are illustrated below (we blurred the logos for this article partially to illustrate that prototype split testing doesn’t require significant risk exposure to your brand to be effective and partially to appease our lawyers):

We imported the designs into Invision to create an interactive prototype, and used Validately for split testing. In Validately, we ran four cohorts of targeted users to the different splash screen variations. All cohorts were then directed to the identical series of three screens. To clarify, each cohort of users had the exact same user experience, except for a splash screen that displayed a different brand logo as the maker of the app to each cohort.

After exposing each cohort to their variant, we asked them to describe and rate the experience on pre-set metrics such as net promoter score, ease of use, likelihood to use, and monetary value perception. We also asked them to select from a list aspects of the experience they liked and disliked.

It’s important to note that validating demand with prototypes specifically requires users to make some sort of sacrifice before affirming desire for a given product. Therefore, the insight we generate with split testing is always represented in a comparative context. For example, instead of asking "Do you like this feature?" we’d ask users to "Rank the following features with regard to preference: X, Y, Z" so there is a sacrifice of one choice instead of another.

Making Sense of Value Perceptions

There’s always a risk of getting caught in surface level problems that are common pain points affecting a lot of people but that don’t actually yield any sort of actionable (e.g. bottom-line impacting) insight (i.e. like deep wells). Therefore, we are primarily concerned with data points that are either surprising outliers or surprisingly mundane.

Of all the brands, the health insurance provider was the least likely to cause concerns about security and trust (and by a relatively wide margin).

Respondents said they would be considerably less likely to pay for an app made by a health insurance provider as opposed to the same apps made by the other brands.

All apps performed miserably with regard to Net Promoter Scores (i.e. “How likely are you to tell a friend about this app?”), but the health insurance provider performed the worst.

The first point about ease of use is expected as all apps had the exact same user experience. This served more or less to validate that our test controls were working adequately. But after that, the points get interesting.

Respondents weren’t less likely to try an app made by a health insurance provider than an app made by a startup or popular information aggregator. Further, the health insurance provider actually ranked as more trusted than any of the other brands. Of course, no fewer than 45% of users said trust was an issue they had with their version of the app, but this appears to be more about the nature of healthcare data instead of an issue with healthcare insurance providers in particular.

That being said, the healthcare insurance provider did perform significantly and comparatively worse with regard to price value and Net Promoter Score.

As aforementioned, there are numerous ways to interpret the data. But I can come up with many reasons to explain why users would be far less likely to pay for or tell friends about an app provided by an insurance provider than by an independent brand. For example, a user could easily assume that the insurance provider has a unique app that’s included with each health plan, and that another person could not simply take a recommendation to download any particular version. The main point however is that, given the data, lower NPS and price values are a cause for concern, but are not the results of mistrust.

Thus, while I’d argue for continued experiments to refine and optimize the proposed digital offering, our initial split test proved that health insurance providers would not suffer more than other industry players from consumer mistrust. In fact, if the app were to be positioned as a cost-cutting platform (by making users healthier) rather than a revenue-generating source, it could be a strategic opportunity for insurance providers. And that is the type of insight prototype split testing can efficiently and consistently generate for product managers.

A Sustainable Framework

While split testing can generate user insight at any stage of the lifecycle, it gets more effective as the product becomes more tangible. As you move from text-based value propositions to a simulated product, users’ reactions will become more authentic and in line with actual performance of the product.

Of course, you need the right tools and services to support these best practices and methodologies. We set up experiments quickly—sourcing users and designers at scale and interpret results right away. We use Invision and Validately to perform this type of split testing for clients, but here’s a useful roundup of some of the other tools and services the best product managers use.

Rely on split testing not just to guide your product’s direction, but also to prioritize features on your roadmap, identify key user groups, and validate competitive advantages. It’s no surprise that organizations that effectively build feedback loops into their product lifecycles, from conception to launch, are able to consistently build successful products.

About the Author(s)

Nis works at the intersection of product and content for Alpha, a platform for Fortune 500 product teams to generate user insights on demand. His articles have been published in Forbes, Content Marketing Institute, and The Next Web.

Steven tries hard to be a good father, husband and entrepreneur . He sold his first two companies to LivingSocial and TripAdvisor. Validately is Steven's third company. Validately's mission is to help teams build better products. Validately is a rapid user feedback platform. Validately makes it easy and affordable to test prototypes and live sites on customers or potential customers and learn about Demand or Usability BEFORE you code. Try it for free at Validately.com.

Comments

I liked how you guys used the same exact screen across various entities in the healthcare space, it really helps you get a clear understanding of how subjective these tests can be, but of course then you are left with the challenge of interpreting why people feel the way that they did.

After reading the article I do have a question though.

Where are these people coming from in order to test the actual software? You mention 4 cohorts of targetted traffic, but details on the audiences or lack thereof, really play a huge role in how we can interpret this data.

Great question! Tools like Validately include integrations that enable you to source user groups. For this particular test, the only demographic information we constrained for was an American audience between 21 and 65, to ensure that everyone was at least of reasonable healthcare-purchasing age (so the app would be relevant to them). Validately of course enables a lot more targeting than this, but for the sake of the test we kept is simple :)

It is definitely possible that targeting in on one specific demographic could have led to entirely different results. The point of these tests is to really validate or invalidate another iteration or experiment. As I said in the article, I wouldn't recommend any company go ahead and invest enormous resources into building an app based off this one test. But there's cause to at least continue testing.

Hope that answers your question!

Brian Hoadleyt

February 17, 2015

In the UK I've used Userzoom quite a lot to split test prototypes during the design process. It's a great way to get quantitative results in what would otherwise have only been a qualitative study. Typically I would conduct a qualitative user test first to iron out the more obvious issues with Customers, and then run a Userzoom test on iterated prototype variations to get a quant output. This can be accompanied by both NPS and SUS based questions to help drive additional metrics around whatever you are testing. Of course, post launch, I then advocate ongoing live site usability testing to benchmark key Customer journeys and A/B and multivariate testing to drive continuous improvements.

Spot on - you touched on a point I glossed over quickly in the article. Before doing a split test like this, we obviously had to know roughly what features would even be remotely interesting in a consumer-facing healthcare app. We found that out through one-on-one interviews and some industry surveys. Could not have done everything mentioned in the article without that initial insight.

Corey Dawson Hall

February 13, 2015

Great click bate on the headline. I understand that healthcare is a big market right now, but please offer a bit more than just a pitch. Share an insight, it might make people trust you. But anyway, good marketing... I read it.

Corey, I am sorry you felt like the headline was "click bate" and that the article didn't give you insights. That was not the intent. Actually, as I look at it again, I see a tactical blue print on how to do exactly what the title said. Which is how to use split testing techniques to validate demand before code. These techniques apply to all industries.

Daniel

February 13, 2015

Hi Steven, thanks for the correction, I read that information from the landing page you linked in your article. Q: How much does Validately cost? A: We charge per test*, so you can get as much feedback from as many reviewers as you want, our analytics engine will let you slice and dice the results to quickly gain insight. * You can purchase an external review panel at $10 / response. Testing on your own users or internal stakeholders incurs no additional cost.

That might have been our old pricing. Here is our current pricing plans: https://validately.com/pricing

Email me with any questions steven @ validately . com

daniel

February 13, 2015

I have to say that at Validity's price point you are much better off to have developer run the tests, it would be cheaper, you can rapid cluster test, and you won't have to pay per test.

There are ton of tools out there that managers push for to try and replace designers and developers, the truth is they cost more, actually take longer to implement and give worse results.

Absolutely A /B test as much of your product as possible, but you're wasting time and money trying to completely automate the process, something that would not be truely achieved until Artifical Intelligence becomes a reality, and when that happens we can replace the manager as well as the production team.

Thanks Daniel. We try very hard to make Validately easy to use and extremely affordable. We don't charge per test or per test result to enable our customers to do exactly as you suggested, becuase that is a way to get statistically significant results.

Ben B

February 12, 2015

I am in the financial products space where the Net Promoter Score is a key indicator for us. We are considering launching another tier or two of our current product but are unsure what the perceived value of the offerings will be.

Which data points do you think would be most useful in helpfing us determine this and what value would you place on the NPS of these experiments against our current NPS?

Thanks for your question Ben. I agree with Nis. You should try to get pre-purchases of the tiers. Ultimately, that is where the rubber hits the road. I have seen it happen over and over in both B2B and Consumer products, so it is an achievable goal. Actually, getting a customer to say "No, I won't pre-purchase that tier" can be an invaluable learning experience. Because the next question is "Why not?" Your GOAL is to figure out why customers won't convert to the higher tier BEFORE you build a bunch of features. If you wait until after, then you might be building features that they don't value at the sake of features that they do value.

As for NPS, the key to any qualitative questioning is validating that the customer truly means what they say. We believe there is only one way to validate the customer's answer is how they feel...we call that "The Cost Question." That means, you need to make it cost the custome something to say "yes." I can be financial, but it can also cost the customer their Time (in the form of meeting with you and giving detailed feedback without compensation) or their Reputation (in the form of inviting others to try your product). We go in more detail here: https://validately.com/leancustomerresearch/validate-demand/

You can email me if you want to discuss how to apply these techniques to your product.

There's rarely a better experiment than one that generates purchases pre-development. Can you simply offer the ability to pre-order different tiers on a landing page with a note that if there is insufficient demand, you will refund the money? Other than that, perhaps bring in cohorts of your most active customers and show them each a variant to gauge interest. You'd be surprised how willing customers are to provide feedback especially if it gives them exclusive access to future products.

With regard to data points, from my experience the best thing to do is find your happiest existing customers and ask them a list of questions. Tie the data points that most strongly correlate to their active behavior, and use those to reverse engineer a set of metrics to use to evaluate future product concepts. Obviously this should also be experimented with until you find a collection of data points, potentially including NPS, that accurately reflect the perceived value of an offering.