What your team can learn from team Obama about A/B testing

Psychologist Richard E. Nisbett has made a career out of solving common problems with scientific and statistical principles. In this excerpt from his new book, Mindware: Tools For Smart Thinking, he looks at how A/B testing helped the Obama campaign blindside its Republican rivals.

Shortly after Barack Obama announced he was running for president in the fall of 2007, Google’s CEO, Eric Schmidt, interviewed him in front of a large audience of Google employees. As a joke, Schmidt’s first question was, “What is the most efficient way to sort a million 32- bit integers?” Before Schmidt could ask a real question, Obama interrupted: “Well, I think the bubble sort would be the wrong way to go,” a response that was in fact correct. Schmidt slapped his forehead in astonishment, and the room broke out in applause. Later, in the question-and-answer period, Obama assured his audience, “I am a big believer in reason and facts and evidence and science and feedback,” and promised that he would run the government accordingly.

In the audience that day was a product manager named Dan Siroker, who made a decision on the spot to go to work for Obama. “He had me at bubble sort.”

Siroker had some science to offer Obama’s campaign. He showed workers how to A/B test. When you don’t know which of two treatments or procedures is best to achieve some goal, you compare the two by flipping a coin to decide who gets treatment A and who gets treatment B. You then collect data relevant to the question you’re interested in and analyze the data by comparing the average of A with the average of B using a statistical test of some kind.

A/B

By the time Dan Siroker joined the Obama campaign website, developers for Google (GOOG) and other Internet companies had for several years been testing variations of web pages online. Instead of basing decisions about web design on HiPPOs—the derisive term for the “highest- paid person’s opinion”—they were acting on incontrovertible facts about what worked best. A certain percentage of web users would be offered a home page design with lots of blue, and other users would be offered a design with lots of red. The information they sought was “ percent who clicked.” Potentially every aspect of the page from color to layout to images to text would be tested simultaneously on randomly selected users. The evidence, and not the HiPPO, was the decider about what should be on the website.

The application of A/B testing to political websites was straightforward. A major question was how to design a web page that would optimize the number of e- mail addresses for possible donors. For example, which button would get the most sign- ups—“Learn More,” “Join Us Now,” or “Sign Up Now”? Which image would get more sign- ups—a luminous turquoise photo of Obama, a black-and-white photo of the Obama family, or a video of Obama speaking at a rally ?

I’m guessing you wouldn’t have predicted that the combination of “Learn More” plus a family photo would be the most effective. And not just a little more effective. That combination produced 140 percent more donors than the least effective combination, which translates into a huge difference for donations and votes.

Website designers have learned what social psychologists discovered decades ago about their intuitions concerning human behavior in novel situations. As Siroker puts it, “Assumptions tend to be wrong.”

From 2007 on, A/B testing dictated a wide range of Obama campaign decisions. The campaign specialist and former social psychologist Todd Rogers conducted dozens of experiments for Obama. Some of the experiments were shots in the dark. Is it better for donations and for voter turnout to get a robocall from Bill Clinton or a chatty call from a volunteer? (The latter, it turns out, by a lot.) A visit from a campaign worker just before Election Day is the single most effective way yet discovered to get someone to show up at the polls.

There is now a large body of research on what works for getting out the vote. Which is more effective at getting people to the polls: telling people that turnout is expected to be light or that turnout is expected to be heavy ? You might think that telling people that voting is going to be light would make them more likely to vote. A quick cost-benefit analysis shows that your vote would count for more than if turnout was heavy. But remember how susceptible people are to social influence. They want to do what other people like them are doing. If most are drinking a lot, they’ll go along; if they’re not drinking a lot, they’ll cut back. If most people are reusing their towels in their hotel room, so will they. And so telling voters there will be a heavy turnout in their precinct turns out to be much more effective than saying there will be a light turnout.

Is it effective to let people know that you know they voted in the last election—and that you’ll be checking on them after this one? People want to look good in others’ eyes—and in their own. So it’s not surprising to learn that the promise of checking up can be worth 2.5 turnout percentage points or even more. But only A/B testing could show whether the tactic of checking up would produce positive or negative results, or would have no impact at all.

In both 2008 and 2012 the Obama campaign had so many tricks up its sleeve that the Republican campaigns were blindsided. The Romney campaign in 2012 was so confident of victory that no concession speech was prepared for the candidate.

Republicans, however, are perfectly capable of playing the A/B game themselves. Indeed, already in 2006, the campaign of Governor Rick Perry of Texas had established that bang for the buck was poor for direct voter-contact mail, paid phone calls, and lawn signs. So the campaign spent no money on those things. Instead, the campaign used TV and radio spots heavily. Just which spots were the most effective was established by isolating eighteen TV markets and thirty radio stations and assigning start dates at random. Opinion polls tracked just which spots produced the biggest shifts toward Perry. The design’s randomized nature added hugely to the accuracy of the results. Campaign workers weren’t allowed to pick which market got which treatment at which time. If they had, any improved poll results could have been due to changed conditions in a given market rather than whether an ad had been placed in that market.

A/B testing can be just as useful for business as for politics, because researchers can segment the population and assign different treatments at random. When the number of cases (N) is very large, even very small differences can be detected. And in business as in politics, a small increment can make all the difference to success.