On the Ethics of A/B Testing

The discussion triggered by Facebook’s mood manipulation experiment has been enlightening and frustrating at the same time. An enlightening aspect is how it has exposed divergent views on a practice called A/B testing, in which a company provides two versions of its service to randomly-chosen groups of users, and then measures how the users react. A frustrating aspect has been the often-confusing arguments made about the ethics of A/B testing.

One thing that is clear is that the ethics of A/B testing are an important and interesting topic. This post is my first cut at thinking through these ethical questions. I am thinking about A/B testing in general, and not just testing done for academic research purposes. Some disclaimers: I am considering A/B testing in general rather than one specific experiment; I am considering what is ethical rather than what is legal or what is required by somebody’s IRB; I am considering how people should act rather than observing how they do act.

Let’s start with an obvious point: Some uses of A/B testing are clearly ethical. For example, if a company wants to know which shade of blue to use in their user interface, they might use A/B testing to try a few shades and measure user’s responses. This is ethical because no user is harmed, especially if the only result is that the service better serves users.

Here’s another point that should be obvious: Some uses of A/B testing are clearly unethical. Consider a study where a service falsely tells teens that their parents are dead, or a study that tries to see if a service can incite ethnic violence in a war-torn region. Both studies are unethical because they cause significant harm or risk of harm.

So the question is not whether A/B testing is ethical, but rather where we should draw the line between ethical and unethical uses. A consequence of this is that any argument that implies that A/B testing is always ethical or always unethical must be wrong.

Here’s an example argument: Company X does A/B testing all the time; this is just another type of A/B testing; therefore this is ethical. Here’s another: Company X already uses an algorithm to decide what to show to users, and that algorithm changes from time to time; this is just another change to the algorithm; therefore this is ethical. Both arguments are invalid, in the same way that it’s invalid to argue that Chef Bob often cuts things with a knife, therefore it is ethical for him to cut up anything he wants. The ethical status of the act depends on what exactly Chef Bob is cutting, what exact A/B test is being done, or what exact algorithm is being used. (At the risk of stating the obvious: the fact that these sorts of invalid arguments are made on behalf of a practice does not in itself imply that the practice is bad.)

Another argument goes like this: Everybody knows that companies do A/B tests of type X; therefore it is ethical for them to do A/B tests of type X. This is also an invalid argument, because knowledge that an act is occurring does not imply that the act is ethical.

But the “everyone knows” argument is not entirely irrelevant, because we can refine it into a more explicit argument that deserves closer consideration. This is the implied consent argument: User Bob knows that if he uses Service X he will be subject to A/B tests of Type Y; Bob chooses to use Service X; therefore Bob can be deemed to have consented to Service X performing A/B tests of Type Y on him.

Making the argument explicit in this way exposes two potential failures in the argument. First, there must be general knowledge among users that a particular type of testing will happen. “Everyone knows” is not enough, if “everyone” means everyone in the tech blogosphere, or everyone who works in the industry. Whether users understand something to be happening is an empirical question that can be answered with data; or a company can take pains to inform its users—but of course I mean actually informing users, not just providing some minimal the-information-was-available-if-you-looked notification theater.

Second, the consent here is implied rather than explicit. In practice, User Bob might not have much real choice about whether to use a service. If his employer requires him to use the service, then he would have to quit his job to avoid being subject to the A/B test, and the most we can infer from his use of the service is that he dislikes the test less than he would dislike losing his job. Similarly, Bob might feel he needs to use a service to keep tabs on his kids, to participate in a social or religious organization, or for some other reason. The law might allow a legal fiction of implied consent, but what we care about ethically is whether Bob’s act of using the service really does imply that he does not object to being a test subject.

Both of these caveats will apply differently to different users. Some users will know about a company’s practices but others will not. Some users will have a free, unconstrained choice whether to use a service but others will not. Consent can validly be inferred for some users and not others; and in general the service won’t be able to tell for which users it exists. So if a test is run on a randomly selected set of users, it’s likely that consent can be inferred for only a subset of those users.

Where does this leave us? It seems to me that where the risks are minimal, A/B testing without consent is unobjectionable, as in the shades-of-blue example. Where risks are extremely high or there are significant risks to non-participants, as in the ethnic-violence example, the test is unethical even with consent from participants. In between, there is a continuum of risk levels, and the need for consent would vary based on the risk. Higher-risk cases would merit explicit, no-strings-attached consent for a particular test. For lower-risk cases, implied consent would be sufficient, with a higher rate of user knowledge and a higher rate of unconstrained user choice required as the risk level increases.

Where exactly to draw these lines, and what processes a company should use to avoid stepping over the lines, are left as exercises for the reader.

Comments

Great blog post. I would like to pose another ethical question. Suppose a company does a study with 10k users and the experiment poses minimal risk and thus seems ethical. But then the company uses the information it learned from those 10k users and translates it into an increased profit of 10 million dollars per year. Is not compensating those users ethical?

It’s great to see out the issues and invalid arguments laid out like this.

I would add that the two questions/issues you raised are ALREADY well-addressed within existing ethical frameworks for human subject research.

1) “First, there must be general knowledge among users that a particular type of testing will happen. ”
The ethical framework on obtaining informed consent is quite clear on how to do it correctly. There is even an easy-to-use checklist available. A paragraph in the terms-of-use is nowhere NEAR that level.
Similarly, there are consistent and clear rules saying when informed consent is not needed. They are not difficult to apply in this case, and in 99% of A/B testing cases are almost trivial to apply.

2) “User Bob might not have much real choice about whether to use a service”
Existing ethical frameworks are clear about consent being voluntary, and that subjects must be given the right to refuse. They also explicitly maintain that subjects must not be negatively impacted (compared to the situation where the experiment did not exist) by refusing to participate in any experiment.
The option of “not using Facebook” would negatively impact a person’s quality of life. Therefore if an A/B testing experiment can cause harm (emotional manipulation) then people must be able to opt out and still use Facebook. If it does not cause harm (tweaking ad placement), and informed consent is not required, and allowing opt-out may also be not required of Facebook.

These rules, based on the Nuremberg Code, have literally been obtained in blood and sweat. They are designed to cover much more difficult and ambiguous situations than A/B testing for web sites. Pretending that somehow the existing ethical rules are somehow insufficient to cover the situation is ridiculous. It’s not rocket science!

Finally, it is not up to individual researchers to decide these issues by themselves. It is up to the IRB. It looks to me like the Facebook paper authors tried to bypass their IRB by claiming the dataset was pre-existing, despite being involved in the design of the experiment.

To summarize: existing rules clearly and easily cover human subject research via A/B testing. I don’t see why social networking researchers should get a pass. For example, this experiment done by Wikipedia went through normal IRB approval: dx.plos.org/10.1371/journal.pone.0034358 .

If a specific change to an algorithm introduces a credible risk of harm – such as in the “dead parents” and the “let’s incite ethnic violence” scenario, then surely the changed algorithm would be unethical regardless of whether or not it was part of an A/B test?

I would counter this with a hypothetical question: can any website be in and of itself unethical?
I would be inclined to conclude than because a website is fundamentally no different from a document or information retrieval mechanism – it should be treated no differently than any other information retrieval and storage system.
That is to say that it should be treated by the same standards as a published encyclopedia, in essence a website is a form of speech.

Books and speech – while potentially inflammatory, are not in and of themselves capable of doing harm or causing suffering. Surely someone may be offended or hurt by something they read or something they hear, but it is not a right of that person to only hear pleasant information. This is doubly the case when that person has made a positive effort to hear that information.

Another example – is it unethical for a comedian to alter his act slightly between shows to explore various audience reactions?

I would contend that no act of speech made to a consenting audience can possibly be unethical.
To contend otherwise opens the door for all forms of censorship and certainly runs contrary to modern concepts of human rights.

To what has the audience consented? Take the comedian example … please. (Sorry, can’t resist.) It doesn’t challenge Prof. Felten’s comments, because it’s a clear case of no harm. Still, what is the object of consent? Is the audience consenting to yet another variation of wacky humor delivered by Youngman? Or to a performance of a celebrated instance, unchanged, of Youngman’s repertoire? Instead of a comedian, what if this evolving “speech act” were a movie, subtly changed from screening to screening following its having received an Academy Award? Has the audience consented to those changes? Here’s why these absurd examples are relevant: legal issues aside, “speech acts” involving fraud or defamation are arguably unethical. But if I consent to hearing somebody defame another, you’re saying the communication is ethical. If I consent to hearing a salesperson’s fraudulent pitch — that is, I willingly listen to what the salesperson has to say — and then I rely to my detriment on the fraudulent information, I can’t accuse the person of being unethical?

Wow, has A/B testing come a long way or what? When I was first introduced to A/B testing it was nothing like what Facebook did. I think it was either this website or your sister site eff.org from whence I got my first understanding of A/B testing where a participant was presented with two options and picked what they preferred, and perhaps iterated through a bunch of random option pairs to create a data-set of preferences; and then aggregated with others to determine a best case scenario–say for determine best website deign.

Perhaps A/B testing has always been a core research methodology, but I don’t see how Facebook’s game of playing with people’s emotions had anything to do with A/B testing; or it’s definition has changed significantly. It is no doubt that what Facebook did was unethical not only because of the harm, and lack of consent (that apparently are the research communities gripes); but I find it completely unethical by way of “fraud” in that without telling anyone they planned to “manipulate” the posts people see in the first place, regardless of the reason for the manipulation. That itself is unethical IMHO, it amounts to fraud.

To make an analogy of sorts. If I walk into a grocery store and I am shown racks full of clothes, but I am expecting to find food it is quite unethical even if the store says “yea we sell food here but first you have to look at all the clothes; and no you can’t see the food before you buy it.” That would be quite unethical would it not?

Not that I use Facebook, but if I were to be getting onto Facebook based on what Facebook advertises itself as; it would either be to 1) update my friends about my life, or 2) receive an update from my friends about their life. For Facebook to change either my posts, or my friends posts (or even simply to display or not display some such posts of its choosing) itself is unethical; because without telling the users it fundamentally changed the entire service.

If that grocery store posted a sign that said “we are now a clothing store.” That would be okay, but Facebook didn’t post such a sign that said they would choose which posts you get to see and which not in the name of research (they keep claiming it is the users choice as to what gets shown to who and what not, not the service providers choice). That is unethical and absolute fraud, regardless of the ethics of the psychology “A/B” testing as you call it.

Freedom to Tinker is hosted by Princeton's Center for Information Technology Policy, a research center that studies digital technologies in public life. Here you'll find comment and analysis from the digital frontier, written by the Center's faculty, students, and friends.