Facebook has always “manipulated” the results shown in its users’ News Feeds by filtering and personalizing for relevance. But this weekend, the social giant seemed to cross a line, when it announced that it engineered emotional responses two years ago in an “emotional contagion” experiment, published in the Proceedings of the National Academy of Sciences (PNAS).

As a society, we haven’t fully established how we ought to think about data science in practice. It’s time to start hashing that out.

Before the Data Was Big…

Data by definition is something that is taken as “given,” but somehow we’ve taken for granted the terms under which we came to agree that fact. Once, the professional practice of “data science” was called business analytics. The field has now rebranded as a science in the context of buzzwordy “Big Data,” but unlike other scientific disciplines, most data scientists don’t work in academia. Instead, they’re employed in commercial or governmental settings.

The Facebook Data Science team is a prototypical data science operation. In the company’s own words, it collects, manages, and analyzes data to “drive informed decisions in areas critical to the success of the company, and conduct social science research of both internal and external interest.” Last year, for example, it studied self-censorship—when users input but do not post status updates. Facebook’s involvement with data research goes beyond its in-house team. The company is actively recruiting social scientists with the promise of conducting research on “recording social interaction in real time as it occurs completely naturally.” So what does it mean for Facebook to have a Core Data Science Team, describing their work—on their own product—as data science?

Contention about just what constitutes science has been around since the start of scientific practice. By claiming that what it does is data science, Facebook benefits from the imprimatur of an established body of knowledge. It looks objective, authoritative, and legitimate, built on the backs of the scientific method and peer review. Publishing in a prestigious journal, Facebook legitimizes its data collection and analysis activities by demonstrating their contribution to scientific discourse as if to say, “this is for the good of society.”

So it may be true that Facebook offers one of the largest samples of social and behavioral data ever compiled, but all of its studies—and this one, on social contagion—only describe things that happen on Facebook. The data is structured by Facebook, entered in a status update field created by Facebook, produced by users of Facebook, analyzed by Facebook researchers, with outputs that will affect Facebook’s future News Feed filters, all to build the business of Facebook. As research, it is an over-determined and completely constructed object of study, and its outputs are not generalizable.

Ultimately, Facebook has only learned something about Facebook.

The Wide World of Corporate Applied Science

For-profit companies have long conducted applied science research. But the reaction to this study seems to suggest there is something materially different in the way we perceive commercial data science research’s impacts. Why is that?

At GE or Boeing, two long-time applied science leaders, the incentives for research scientists are the same as they are for those at Facebook. Employee-scientists at all three companies hope to produce research that directly informs product development and leads to revenue. However, the outcomes of their research are very different. When Boeing does research, it contributes to humanity’s ability to fly. When Facebook does research, it serves its own ideological agenda and perpetuates Facebooky-ness.

Facebook is now more forthright about this. In a response to the recent controversy, Facebook data scientist Adam Kramer wrote, “The goal of all of our research at Facebook is to learn how to provide a better service…We were concerned that exposure to friends’ negativity might lead people to avoid visiting Facebook. We didn’t clearly state our motivations in the paper.”

Facebook’s former head of data science Cameron Marlow offers, “Our goal is not to change the pattern of communication in society. Our goal is to understand it so we can adapt our platform to give people the experience that they want.”

But data scientists don’t just produce knowledge about observable, naturally occurring phenomena; they shape outcomes. A/B testing and routinized experimentation in real time are done on just about every major website in order to optimize for certain desired behaviors and interactions. Google designers infamously tested up to 40 shades of blue. Facebook has already experimented with the effects of social pressure in getting-out-the-vote, raising concerns about selective digital gerrymandering. What might Facebook do with its version of this research? Perhaps it could design the News Feed to show us positive posts from our friends in order to make us happier and encourage us to spend more time on the site? Or might Facebook show us more sad posts, encouraging us to spend more time on the site because we have more to complain about?

Should we think of commercial data science as science? When we conflate the two, we assume companies are accountable for producing generalizable knowledge and we risk according their findings undue weight and authority. Yet when we don’t, we risk absolving practitioners from the rigor and ethical review that grants authority and power to scientific knowledge.

Facebook has published a paper in an attempt to contribute to the larger body of social science knowledge. But researchers today cannot possibly replicate Facebook’s experiment without Facebook’s cooperation. The worst outcome of this debacle would be for Facebook to retreat and avoid further public relations fiascos by keeping all its data science research findings internal. Instead, if companies like Facebook, Google, and Twitter are to support an open stance toward contributing knowledge, we need researchers with non-commercial interests who can run and replicate this research outside of the platform’s influence.

Facebook sees its users not as a population of human subjects, but as a consumer public. Therefore, we—that public and those subjects—must ask the bigger questions. What are the claims that data science makes both in industry and academia? What do they say about the kinds of knowledge that our society values?

We need to be more critical of the production of data science, especially in commercial settings. The firms that use our data have asymmetric power over us. We do them a favor unquestioningly accepting their claims to the prestige, expertise, and authority of science as well.

Ultimately, society’s greatest concerns with science and technology are ethical: Do we accept or reject the means by which knowledge is produced and the ends to which it is applied? It’s a question we ask of nuclear physics, genetic modification—and one we should ask of data science.

Big Data may not be much to look at, but it can be powerful stuff. For instance, this is what the new National Security Agency (NSA) data center in Bluffdale, Utah, looks like.

George Frey/Getty Images

New technologies are not all equal. Some do nothing more than add a thin extra layer to the top-soil of human behavior (i.e., Teflon and the invention of non-stick frying pans). Some technologies, however, dig deeper, uprooting the norms of human behavior and replacing them with wholly new possibilities. For the last few months I have been arguing that Big Data — the machine-based collection and analysis of astronomical quantities of information — represents such a turn. And, for the most part, I have painted this transformation in a positive light. But last week’s revelations about the NSA’s PRISM program have put the potential dangers of Big Data front and center. So, let’s take a peek at Big Data’s dark side.

The central premise of Big Data is that all the digital breadcrumbs we leave behind as we go about our everyday lives create a trail of behavior that can be followed, captured, stored and “mined” en-mass, providing the miners with fundamental insights into both our personal and collective behavior.

The initial “ick” factor from Big Data is the loss of privacy, as pretty much every aspect of your life (location records via mobile phones, purchases via credit cards, interests via web-surfing behavior) has been recorded — and, possibly, shared — by some entity somewhere. Big Data moves from “ick” to potentially harmful when all of those breadcrumbs are thrown in a machine for processing.

This is the “data-mining” part of Big Data and it happens when algorithms are used to search for statistical correlations between one kind of behavior and another. This is where things can get really tricky and really scary.

Consider, for example, the age-old activity of securing a loan. Back in the day you went to a bank and they looked at your application, the market and your credit history. Then they said “yes” or “no.” End of story. In the world of Big Data, banks now have more ways to assess your credit worthiness.

“We feel like all data is credit data,” former Google CIO Douglas Merrill said last year in The New York Times. “We just don’t know how to use it yet.” Merrill is CEO of ZestCash, one of a host of start-up companies using information from sources such as social networks to determine the probability that an applicant will repay their loan.

Your contacts on LinkedIn can be used to assess your “character and capacity” when it comes to loans. Facebook friends can also be useful. Have rich friends? That’s good. Know some deadbeats, not so much. Companies will argue they are only trying to sort out the good applicants from the bad. But there is also a real risk that you will be unfairly swept into an algorithm’s dead zone and disqualified from a loan, with devastating consequences for your life.

Jay Stanley of the ACLU says being judged based on the actions of others is not limited to your social networks:

Credit card companies sometimes lower a customer’s credit limitbased on the repayment history of the other customers of stores where a person shops. Such “behavioral scoring” is a form of economic guilt-by-association based on making statistical inferences about a person that go far beyond anything that person can control or be aware of.

The link between behavior, health and health insurance is another gray (or dark) area for Big Data. Consider the case of Walter and Paula Shelton of Gilbert, Louisiana. Back in 2008, Business Weekreported how the Sheltons were denied health insurance when records of their prescription drug purchases were pulled. Even though their blood pressure and anti-depression medications were for relatively minor conditions, the Sheltons had fallen into another algorithmic dead zone in which certain kinds of purchases trigger red flags that lead to denial of coverage.

Since 2008 the use of Big Data by the insurance industry has only become more entrenched. As The Wall Street Journal reports:

Companies also have started scrutinizing employees’ other behavior more discreetly. Blue Cross and Blue Shield of North Carolina recently began buying spending data on more than 3 million people in its employer group plans. If someone, say, purchases plus-size clothing, the health plan could flag him for potential obesity—and then call or send mailings offering weight-loss solutions.

Of course no one will argue with helping folks get healthier. But with insurance costs dominating company spreadsheets, it’s not hard to imagine how that data about plus-size purchases might someday factor into employment decisions.

And then there’s the government’s use, or misuse, of Big Data. For years critics have pointed to no-fly lists as an example of where Big Data can go wrong.

No-fly lists are meant to keep people who might be terrorists off of planes. It has long been assumed that data harvesting and mining are part of the process for determining who is on a no-fly list. So far, so good.

But the stories of folks unfairly listed are manifold: everything from disabled Marine Corps veterans to (at one point) the late Sen. Ted Kennedy. Because the methods used in placing people on the list are secret, getting off the list can, according to Connor Freidersdorf of The Atlantic, be a Kafka-esque exercise in frustration.

A 2008 National Academy of Sciences report exploring the use of Big Data techniques for national security made the dangers explicit:

The rich digital record that is made of people’s lives today provides many benefits to most people in the course of everyday life. Such data may also have utility for counterterrorist and law enforcement efforts. However, the use of such data for these purposes also raises concerns about the protection of privacy and civil liberties. Improperly used, programs that do not explicitly protect the rights of innocent individuals are likely to create second-class citizens whose freedoms to travel, engage in commercial transactions, communicate, and practice certain trades will be curtailed—and under some circumstances, they could even be improperly jailed.

So where do we go from here?

From credit to health insurance to national security, the technologies of Big Data raise real concerns about far more than just privacy (though those privacy concerns are real, legitimate and pretty scary). The debate opening up before us is an essential one for a culture dominated by science and technology.

Who decides how we go forward? Who determines if a technology is adopted? Who determines when and how it will be deployed? Who has the rights to your data? Who speaks for us? How do we speak for ourselves?