Why data made design worse and how to make it better

by · July 25, 2018

My first exposure to data-driven design was in games. I worked at Playdom, Zynga’s primary competitor, during the social gaming boom of 2009. The sophistication of our data analysis techniques and the platform supporting them played a large role in our eventual $700 million acquisition by Disney.

For most companies at the time, “analytics” just meant counting pageviews. If you were really fancy, you could track the order in which users viewed certain pages and assemble a funnel chart to quantify dropoff. Gartner’s report on the state of web analytics in 2009 describes a range of key challenges, like “how to obtain a sustainable return on investment” and “how to choose a vendor”.

“Why would we need an analyst to tell us what our hitcounter is saying?”

In contrast, social gaming powerhouses like Zynga and Playdom were custom building their own event-based analytics systems from the ground up. They tracked almost every action that players took in a game, allowing them to deeply understand their users’ needs and build features to fulfill them, rather than simply taking their best guesses.

Chapter 1: Flying high

For me, it was incredibly exciting to be on the cutting edge of analytics. For the first time, we could get real insights into players’ actions, aspirations, and motivations. The power of these new data analysis techniques seemed limitless. Zynga went from zero to a billion-dollar valuation in under 3 years. There was a digital gold rush as startups popped up left and right to bring the power of quantitative data insights to every industry imaginable.

In this brave new world, metrics were king. Why would you need a designer? Everything could be tracked, measured, tested. A good PM was one who could “move the metrics”. MBAs and management consultants were hired by the boatload. One friend told me about the time he had to talk his CEO out of firing all the designers in the company and replacing them with analysts.

Chapter 2: Back down to earth

By early 2012, Zynga had been around for about 5 years, with a peak market cap over $10 billion. The company’s successes had been repeated on a smaller scale by other strongly “data-driven” companies on Facebook and on mobile.

However, an interesting trend was beginning to occur in games, with new games like Dragonvale and Hay Day dominating the mobile charts with innovative mechanics supported by a single, unified product vision.

Products driven by a strong creative vision from their inception were consistently winning when pitted against derivative products and products developed through pure metric-driven iteration. Creative vision was a necessary prerequisite to create a product strong enough to land at the top of the charts.

And being at the top of the charts is critical — revenue on the Top Grossing Charts follows a power law, with the handful of apps at the very top of the charts making more money than all the rest of the apps put together. As Zynga’s apps slipped down the charts, their inability to adapt to this new world became apparent and their stock price fell 80%.

Data-driven design had failed, as did intuition-driven design before it. The industry needed a more fundamental shift in perspective. Good teams now design for the long term, guided by intuition but informed by data.

Personally, I like to emphasize the difference between data-driven design (relying on data to make decisions because we have no user empathy) and data-informed design (use data to understand our users, then build features to delight them)

Chapter 3: Data-driven design

When I say “data-driven design”, I’m referring to the mentality of “letting the data decide”. In this paradigm, PMs and designers surrender to the fallibility of their intuition, and thus they elect to remain agnostic, using A/B testing to continuously improve their products.

A number of companies I’ve talked to have bragged about about the fact that they’ve removed intuition from the decision making process. Particularly for companies with weaker design, it’s comforting to be able to say “We don’t have to depend on intuition because the data tells us what to do!”

But testing isn’t a magic bullet. Split tests typically gather data for a period of days or weeks. User lifetimes are typically months or years. If you’re only looking at quantitative data, it’s easy to unintentionally trade off difficult-to-measure metrics like long-term product health in exchange for easy-to-measure short-term metrics like revenue.

Treating metrics as end goals (rather than simply as indicators of good product direction) frequently results in unintended consequences and a degraded user experience.

Example: Aggressive paywalls

Zoosk is a dating app that built a huge userbase as a Facebook app during the heyday of data-driven design. They’re extremely aggressive with their monetization, with misleading buttons designed to constantly surprise the user with paywalls.

Oh boy, a message!Gotcha! Paywall!

A company naively focusing on revenue will naturally iterate their way to this point, experimenting with increasingly early and aggressive paywalls and discovering that the spammier the app becomes, they more money they make.

However, while an aggressive approach can be very profitable in the short run, it quickly drives away non-payers and makes it difficult to engage new users. In the dating space, this results in a user experience that becomes worse every month for subscribers.

Sure enough, judging from AppAnnie/SensorTower estimates, Zoosk’s revenue has probably fallen about 50% since their 2014 high of $200 million.

Example: Searches per user

One of my favorite stories is from a friend who worked on improving the search feature at a major tech company. Their target metric was to increase the number of searches per user, and the most efficient way to do that was to make search results worse. My friend likes to think that his team resisted that temptation, but you can never be totally sure of these things.

Example: Tutorial completion

A standard way to measure the quality of an onboarding experience is to measure what percent of users who start a tutorial actually finish it. However, since there will always be a natural drop off between sessions or over days, one obvious way to increase tutorial throughput is to build a tutorial that attempts to teach all the features in a single session.

Sure enough, tutorial throughput goes up, but now users are getting overwhelmed and confused by the pace of exposure to new menus and features. How to help them find their way? Maybe some arrows! Big, blinking arrows telling the user exactly which button to tap, directing them into submenus 7 levels deep and then back out.

You’ll be able to do this on your own next time, right?

Arrows everywhere can boost tutorial throughput, but all the users will be tapping through on autopilot, contradicting the point of having the tutorial in the first place! Excessive handholding of users increases tutorial completion (an easy to measure metric), but decreases learning and feelings of accomplishment (difficult to measure but very important metrics).

Example: Intentionally uninformative commununication

“You’ve been invited to a thing! I could tell you where and when it is in the body of this email, but I’d rather force you to visit my website to spam you with ads. Oh, look at how high our DAUs are! Thanks for using Evite!”

If this email were helpful, Evite would have to find a different way to make money

Equally frustrating to users: Push notifications that purposely leave out information to force users to open the app. Users will flee to the first viable alternative that actually values the user experience.

Example: User experience

In a purely data-driven culture, justifying investment in user experience is a constant uphill battle.

Generally, fixing a minor UI issue or adding some extra juice to a button press won’t affect the metrics in any kind of a measurable way. User experience is one of those “death by 1000 cuts” things where the benefits don’t become visible until after a significant amount of work has already been put in.

As a result, it’s easy to constantly deprioritize improvements to the user experience under the argument of “why would we fix that issue if it’s not going to move the needle?”

To create great UX requires a leap of faith, a belief that the time you’re investing is worthwhile despite what the metrics say right now.

Hearthstone is a great example. Besides being a great game, it’s full of moments of polish and delight like the finding opponent animation and interactive backgrounds that are completely unnecessary from an minimum viable product perspective, but absolutely critical for creating a product that feels best-in-class.

Example: Sales popups

When I was at Playdom, we would show popups when an app was first opened. They’d do things like encourage users to send invites, or buy an item on sale, like this popup from Candy Crush does.

Do you want revenue now or a userbase in the future?

I hate these. They degrade the user experience, frustrate the user, hurt the brand, and generally make interacting with your product a less delightful experience.

It always gave me a bit of schadenfreude to open a competitor’s game and see a sale popup for the first time, because the same pattern always repeated itself: As the weeks went by, more and more aggressive and intrusive popups would invade the user experience, right up until the game disappeared from the charts because all the users churned out.

Even retention isn’t foolproof

As a final note, while most of the examples above involve some variation on accidentally degrading retention, even optimizing for retention doesn’t prevent these mistakes from occurring if you’re optimizing over the wrong timescale or for the wrong audience of users.

Typically, companies will look at metrics like 1-day, 7-day, 30-day retention because those numbers tend to correlate highly with user lifetimes. But focusing on cohort retention runs the risk of over-optimizing your product for the new users that you’re measuring, perhaps by over-simplifying your product, or neglecting the features loved by your elder users, or creating features that benefit new users at the expense of your existing audience.

Chapter 4: Data-informed design

In contrast to “data-driven design”, which relies on data to drive decisions, in “data-informed design” data is used to understand your users and inform your intuition. This informed intuition then guides decisions, designs, and product direction. And your intuition improves over time as you run more tests, gather more data, and speak to more users.

Purely data-driven product improvement breaks down when a product needs to get worse in order to get better. (If you’re the sort of person who likes calculus metaphors, continuous improvement gets you to a local maximum, but not to a global maximum.) Major product shifts and innovations frequently require a leap of faith, committing to a product direction with the knowledge that initial metrics may be negative for an extended period of time until the new direction gets dialed in and begins to mature.

When Facebook introduced its newsfeed, hundreds of thousands of users revolted in protest, calling for boycotts and petitioning for removal of the feature. Now we can’t imagine Facebook without it.

When to stop

I’m often asked, “If you know you’re just going to keep building no matter what the data says, then what’s the point in having data at all? How will we know when to kill the project?”

That’s a great question, since it’s often difficult to tell the difference between a false negative and a true negative. But there are two clear red flags to watch for: when a team loses faith in the project, and when a project stops improving. Ed Catmull cites the same criteria in Creativity, Inc. for knowing when one of Pixar’s movies is in trouble. Recognizing when a product is stuck is a challenge for any company committed to creativity and innovation, regardless of medium.

In data-informed design, learning is a continuous and parallel process. Rather than trying to design a rigorous enough test to validate/invalidate a direction at a particular moment in time, data is consistently gathered over time to measure a trajectory. If the team understands their users well, their work should show a general trend of improvement. If the product isn’t improving, or even if the product IS improving, but the metrics aren’t, then that’s a sign that a change is needed.

Chapter 5: Rules of thumb for data-informed design

It can be hard to know how to strike the right balance between data and intuition, but I do have a few rules of thumb:

The challenge in product development is recognizing when we’re “teaching to the test”, regardless of whether it’s intentional or not. For anything that we’re measuring, I like to ask “is there a way I could move this metric in a positive way that would actually be really bad for our product long-term?” Then I ask, “is the feature I’m thinking about doing some flavor of that accidentally?”

Have a “North Star” vision

I always advocate for having a “North Star” vision. This is a product vision months or years away that you believe your users will love, based off your current best understanding of them.

Since products take a lot of iterations to get good, early product development is full of false negatives on the way to that North Star. People love to talk about the idea of “failing fast” or “invalidating an idea early”, but a lot of times that just isn’t possible. The threshold for viability in a minimum viable product isn’t always obvious, and sometimes it does just take a little more polish or a few extra features to turn the corner.

The best way to get a more trustworthy signal is to just keep building and shipping. A North Star lets you maintain your momentum during the inevitable periods of uncertainty. Over time, small sample sizes accumulate, and noise averages out. Evidence about the product direction will build with time.

Treat metrics as indicators/hints, not goals

It’s important to remember that metrics are leading indicators, not end goals. Similar to how taking a test prep class to improve your SAT score doesn’t actually increase your odds of college success, features that overfocus on moving metrics may not actually improve the underlying product.

The most important question that data can answer is “does the team understand the users?” If so, features will resonate and metrics will improve over time. To validate/invalidate a product direction, look at the trajectory of the metrics, not the result of any individual test.

The right time to kill a project is when the trajectory of improvement flattens out at an unacceptably low level. Generally this means that a few features have shipped and flopped, which is an indicator that there’s some kind of critical gap in the team’s understanding of their users.

This also means that it can be difficult to get away from innovative product/feature ideas quickly. This can be an unpopular opinion in circles that are dogmatic about testing, but the fact of the matter is that I have never seen the “spray and pray” approach work well when it comes to product vision.