Big Data & Ethics

For those not familiar with the phrase, “Big Data” is used to describe the acquisition, storage and analysis of large quantities of data. The search giant Google was one of the pioneers in this area and it is developed into an industry worth billions of dollars. Big Data and its uses also raise ethical concerns.

One common use of Big Data is to analyse customer data so as to make predictions that would be useful in conducting targeted ad campaigns. Perhaps the most infamous example of this is Target’s pregnancy targeting. This Big Data adventure was a model of inductive reasoning. First, an analysis was conducted of Target customers who had signed up for Target’s new baby registry. The purchasing history of these women was analysed to find patterns of buying that corresponded to each stage of pregnancy. For example, pregnant women were found to often buy lots of unscented lotion at the start of the second trimester. Once the analysis revealed the buying patterns of pregnant women, Target then applied this information to the buying patterns of women customers. Oversimplifying things, they were essentially using an argument by analogy: inferring that hat women not known to be pregnant who had X,Y, and Z patterns were probably pregnant because women known to be pregnant had X,Y, and Z buying patterns. The women who were tagged as probably pregnant were then subject to targeted ads for baby products and this proved to be a winner for Target, other than some public relations issues.

One interesting aspect of this method is that it does not follow the usual model of predicting a person’s future buying behavior from his/her past buying behavior. An example of predicting future buying behavior based on past behavior would be predicting that I would buy Gatorade the next time I went grocery shopping because I have been bought it consistently in the past. The analysis used by Target and other companies differs from this model by making inferences about the future behavior of customers based on their similarity to customers whose past buying behavior is known. For example, a store might see shifts in someone’s buying behavior that matches other data from people starting to get into fitness and thus predict the person was getting into fitness. The store might then send the person (and others like her) targeted ads featuring Gatorade coupons because their models show that such people buy more Gatorade.

This method also has an interesting Sherlock Holmes aspect to it. The fictional detective was able to use inductive logic (although he was presented as deducing) to make impressive inferences from seemingly innocuousness bits of information. Big Data can do this in reality and make reliable inferences based on what appears to be irrelevant information. For example, likely voting behavior might be inferred from factors such as one’s preferred beverage.

Naturally, Big Data can be used to sell a wide variety of products, including politicians and ideology. It also has non-commercial applications, such a law enforcement and political uses. As such, it is hardly surprising that companies and agencies are busily gathering and analyzing data at a relentless and ever growing pace. This certainly is cause for concern.

One ethical concern is that the use of Big Data can impact the outcome of elections. For example, analyzing massive amounts of data information can be acquired that would allow ads to be effectively crafted and targeted. Given that Big Data is expensive, the data advantage would tend to go to the side with the most money, thus increasing the influence of money on the outcome of elections. Naturally, the influence of money on elections is already a moral concern. While more spending does not assure victory, there is a clear connection between spending and success. To use but one obvious example, Mitt Romney was able to beta his Republican competitors in part by being able to outlast them financially and outspend them.

In any case, Big Data adds yet another tool and expense to political campaigning, thus making it more costly for people to run for office. This, in turn, means that those running for office will need even more money than before, thus making money an even greater factor than in the past. This, obviously enough, increases the ability of those with more money to influence the candidates and the issues.

On the face of it, it would seem unreasonable to require that campaigns go without Big Data. After all, it could be argued that this would be tantamount to demanding that campaigns operate in ignorance. However, the concerns about big money buying Big Data to influence elections could be addressed by campaign finance reform, which would be another ethical issue.

Perhaps the biggest ethical concern about Big Data is the matter of privacy. First, there is the ethical worry that much of the data used in Big Data is gathered without people knowing how the data will be used (and perhaps that it is even being gathered). For example, the customers at Target seemed to be unaware that Target was gathering such data about them to be analyzed and used to target ads.

While people might know that information is being collected about them, knowing this and knowing that the data will be analyzed for various purposes are two different things. As such, it can be argued that private data is being gathered without proper informed consent and this is morally wrong.

The obvious solution is for data collectors to make it clear about what the data will be used for, thus allowing people to make an informed choice regarding their private information. Of course, one problem that will remain is that it is rather difficult to know what sort of inferences can be made from seemingly innocuous data. As such, people might think that they are not providing any private data when they are, in fact, handing over data that can be used to make inferences about private matters.

If a business claims that they would be harmed because people would not hand over such information if they knew what it would be used for, the obvious reply is that this hardly gives them the right to deceive to get what they want. However, I do not think that businesses have much to worry about—Facebook has shown that many people are quite willing to hand over private information for little or nothing in return.

A second and perhaps the most important moral concern is that Big Data provides companies and others with the means of making inferences about people that go beyond the available data and into what might be regarded as the private realm. While this sort of reasoning is classic induction, Big Data changes the game because of the massive amount of data and processing power available to make these inferences, such as whether women are pregnant or not. In short, the analysis of seemingly innocuous data can yield inferences about information that people would tend to regard as private—or at the very least, information they would not think would be appropriate for a company to know.

One obvious counter to this is to argue that privacy rights are not being violated. After all, as long as the data used does not violate the privacy of individuals, then the inferences made from this data cannot be regarded as violating people’s privacy, even if the inferences are about matters that people would regard as private (such as pregnancy). To use an analogy, if I were to spy on someone and learn from thus that she was an alcoholic, then I would be violating her privacy. However, if I inferred that she is an alcoholic from publically available information, then I might know something private about her, but I have not violated her privacy.

This counter is certainly appealing. After all, there does seem to be a meaningful and relevant distinction between directly getting private information by violating privacy and inferring private information using public (or at least legitimately provided) data. To use an analogy, if I get the secret ingredient in someone’s prize recipe by sneaking a look at the recipe, then I have acted wrongly. However, if I infer the secret ingredient by tasting the food when I am invited to dinner, then I have not acted wrongly.

A reasonable reply to this counter is that while there is a difference between making an inference that yields private data and getting the data directly, there is also the matter of intent. It is, for example, one thing to infer the secret ingredient simply by tasting it, but it is quite another to arrange to get invited to dinner specifically so I can get that secret ingredient by tasting the food. To use another example, it is one thing to infer that someone is an alcoholic, but quite another to systematically gather public data in order to determine whether or not she is an alcoholic. In the case of Big Data, there is clearly intent to infer data that customers have not already voluntarily provided. After all, if the data had been provided, there would be no need to undertake an analysis in order to get the desired information. Thus, while the means do not involve a direct violation of privacy rights, they do involve an indirect violation—at least in cases in which the data is private (or at least intended to be private).

The solution, which would probably be rather problematic to implement, would involve setting restrictions on what sort of inferences can be made from the data on the grounds that people have a right to keep that information private, even if the means used to acquire it did not involve any direct violations of privacy rights.