Debates over information privacy are often framed as an inescapable conflict between competing interests: a lucrative or beneficial technology, as against privacy risks to consumers. Policy remedies traditionally take the rigid form of either a complete ban, no regulation, or an intermediate zone of modest notice and choice mechanisms.

We believe these approaches are unnecessarily constrained. There is often a spectrum of technology alternatives that trade off functionality and profit for consumer privacy. We term these alternatives “privacy substitutes,” and in this Essay we argue that public policy on information privacy issues can and should be a careful exercise in both selecting among, and providing incentives for, privacy substitutes.[1]

I. Disconnected Policy and Computer Science Perspectives

Policy stakeholders frequently approach information privacy through a simpleinte balancing. Consumer privacy interests rest on one side of the scales, and commercial and social benefits sit atop the other.[2] Where privacy substantially tips the balance, a practice warrants prohibition; where privacy is significantly outweighed, no restrictions are appropriate. When the scales near equipoise, practices merit some (questionably effective[3]) measure of mandatory disclosure or consumer control.[4]

Computer science researchers, however, have long recognized that technology can enable tradeoffs between privacy and other interests. For most areas of technology application, there exists a spectrum of possible designs that vary in their privacy and functionality[5] characteristics. Cast in economic terms, technology enables a robust production-possibility frontier between privacy and profit, public benefit, and other values.

The precise contours of the production-possibility frontier vary by technology application area. In many areas, privacy substitutes afford a potential Pareto improvement relative to naïve or status quo designs. In some application areas, privacy substitutes even offer a strict Pareto improvement: privacy-preserving designs can provide the exact same functionality as intrusive alternatives. The following Subparts review example designs for web advertising, online identity, and transportation payment to illustrate how clever engineering can counterintuitively enable privacy tradeoffs.

A. Web Advertising

In the course of serving an advertisement, dozens of third-party websites may set or receive unique identifier cookies.[6] The technical design is roughly akin to labeling a user’s web browser with a virtual barcode, then scanning the code with every page view. All advertising operations—from selecting which ad to display through billing—can then occur on advertising company backend services. Policymakers and privacy advocates have criticized this status quo approach as invasive since it incorporates collection of a user’s browsing history.[7] Privacy researchers have responded with a wide range of technical designs for advertising functionality.[8]

Frequent buyer programs provide a helpful analogy. Suppose a coffee shop offers a buy-ten-get-one-free promotion. One common approach would be for the shop to provide a swipe card that keeps track of a consumer’s purchases, and dispenses rewards as earned. An alternative approach would be to issue a punch card that records the consumer’s progress towards free coffee. The shop still operates its incentive program, but note that it no longer holds a record of precisely what was bought when; the punch card keeps track of the consumer’s behavior, and it only tells the shop what it needs to know. This latter implementation roughly parallels privacy substitutes in web advertising: common elements include storing a user’s online habits within the web browser itself, as well as selectively parceling out information derived from those habits.

Each design represents a point in the spectrum of possible tradeoffs between privacy—here, the information shared with advertising companies—and other commercial and public values. Moving from top to bottom, proposals become easier to deploy, faster in delivery, and more accurate in advertisement selection and reporting—in exchange for diminished privacy guarantees.

B. Online Identity

Centralized online identity management benefits consumers through both convenience and increased security.[9] Popular implementations of these “single sign-on” or “federated identity” systems include a sharp privacy drawback, however: the identity provider learns about the consumer’s activities. By way of rough analogy: Imagine going to a bar, where the bouncer phones the state DMV to check the authenticity of your driver’s license. The bouncer gets confirmation of your identity, but the DMV learns where you are. Drawing on computer security research, Mozilla has deployed a privacy-preserving alternative, dubbed Persona. Through the use of cryptographic attestation, Persona provides centralized identity management without Mozilla learning the consumer’s online activity. In the bar analogy, instead of calling the DMV, the bouncer carefully checks the driver’s license for official and difficult-to-forge markings. The bouncer can still be sure of your identity, but the DMV does not learn of your drinking habits.

C. Transportation Payment

Transportation fare cards and toll tags commonly embed unique identifiers, facilitating intrusive tracking of a consumer’s movements. Intuitively, the alternative privacy-preserving design would be to store the consumer’s balance on the device, but this approach is vulnerable to cards being hacked for free transportation.[10] An area of cryptography called “secure multiparty computation” provides a solution, allowing two parties to transact while only learning as much about each other as is strictly mathematically necessary to complete the transaction.[11] A secure multiparty computation approach would enable the transportation provider to add reliably and deduct credits from a card or tag—without knowing the precise device or value stored.

Engineering Conventions. Information technology design traditionally emphasizes principles including simplicity, readability, modifiability, maintainability, robustness, and data hygiene. More recently, overcollection has become a common practice—designers gather information wherever feasible, since it might be handy later. Privacy substitutes often turn these norms on their head. Consider, for example, “differential privacy” techniques for protecting information within a dataset.[12] The notion is to intentionally introduce (tolerable) errors into data, a practice that cuts deeply against design intuition.[13]

Information Asymmetries. Technology organizations may not understand the privacy properties of the systems they deploy. For example, participants in online advertising frequently claim that their practices are anonymous—despite substantial computer science research to the contrary.[14] Firms may also lack the expertise to be aware of privacy substitutes; as the previous Part showed, privacy substitutes often challenge intuitions and assumptions about technical design.

Implementation and Switching Costs. The investments of labor, time, and capital associated with researching and deploying a privacy substitute may be significant. Startups may be particularly resource constrained, while mature firms face path-dependent switching costs owing to past engineering decisions.

Diminished Private Utility. Intrusive systems often outperform privacy substitutes (e.g., in speed, accuracy, and other aspects of functionality), in some cases resulting in higher private utility. Moreover, the potential for presently unknown future uses of data counsels in favor of overcollection wherever possible.

Inability to Internalize. In theory, consumers or business partners might compensate a firm for adopting privacy substitutes. In practice, however, internalizing the value of pro-privacy practices has proven challenging. Consumers are frequently unaware of the systems that they interact with, let alone the privacy properties of those systems; informing users sufficiently to exercise market pressure may be impracticable.[15] Moreover, even if a sizeable share of consumers were aware, it may be prohibitively burdensome to differentiate those consumers who are willing and able to pay for privacy. And even if those users could be identified, it may not be feasible to transfer small amounts of capital from those consumers. As for business partners, they too may have information asymmetries and reflect (indirectly) lack of consumer pressure. Coordination failures compound the difficulty of monetizing privacy: without clear guidance on privacy best practices, users, businesses, and policymakers have no standard of conduct to which to request adherence.

Organizational Divides. To the extent technology firms do perceive pressure to adopt privacy substitutes, it is often from government relations, policymakers, and lawyers. In some industries the motivation will be another step removed, filtering through trade associations and lobbying groups. These nontechnical representatives often lack the expertise to propose privacy alternatives themselves or adequately solicit engineering input.[16]

Competition Barriers. Some technology sectors reflect monopolistic or oligopolistic structures. Even if users and businesses demanded improved privacy, there may be little competitive pressure to respond.

III. Policy Prescriptions

Our lead recommendation for policymakers is straightforward: understand and encourage the use of privacy substitutes through ordinary regulatory practices. When approaching a consumer privacy problem, policymakers should begin by exploring not only the relevant privacy risks and competing values, but also the space of possible privacy substitutes and their associated tradeoffs. If policymakers are sufficiently certain that socially beneficial privacy substitutes exist,[17] they should turn to conventional regulatory tools to incentivize deployment of those technologies.[18] For example, a regulatory agency might provide an enforcement safe harbor to companies that deploy sufficiently rigorous privacy substitutes.

Policymakers should also target the market failures that lead to nonadoption of privacy substitutes. Engaging directly with industry engineers, for example, may overcome organizational divides and information asymmetries. Efforts at standardization of privacy substitutes may be particularly effective; information technology is often conducive to design sharing and reuse. We are skeptical of the efficacy of consumer education efforts,[19] but informing business partners could alter incentives.

Finally, policymakers should press the envelope of privacy substitutes. Grants and competitions, for example, could drive research innovations in both academia and industry.

Conclusion

This brief Essay is intended to begin reshaping policy debates on information privacy from stark and unavoidable conflicts to creative and nuanced tradeoffs. Much more remains to be said: Can privacy substitutes also reconcile individual privacy with government intrusions (e.g., for law enforcement or intelligence)?[20] How can policymakers recognize privacy substitute pseudoscience?[21] We leave these and many more questions for another day, and part ways on this note: pundits often cavalierly posit that information technology has sounded the death knell for individual privacy. We could not disagree more. Information technology is poised to protect individual privacy—if policymakers get the incentives right.

The area of computer science that we discuss is sometimes referenced as “privacy enhancing technologies” or “privacy-preserving technologies.” We use
the term “privacy substitutes” for clarity and precision.

See, e.g., Balancing Privacy and Innovation: Does the President’s Proposal Tip the Scale?: Hearing Before the Subcomm. on Commerce, Mfg., & Trade of the H.
Comm. on Energy & Commerce
, 112th Cong. 4 (2012) (statement of the Hon. Mary Bono Mack, Chairman, Subcomm. on Commerce, Mfg., & Trade) (“When it comes to the Internet, how do
we—as Congress, as the administration, and as Americans—balance the need to remain innovative with the need to protect privacy?”), available at
http://www.gpo.gov/fdsys/pkg/CHRG-112hhrg81441/pdf/CHRG-112hhrg81441.pdf; Fed. Trade Comm’n, Protecting Consumer Privacy in an Era of Rapid Change 36 (2012) (“Establishing consumer choice as a baseline requirement for companies
that collect and use consumer data, while also identifying certain practices where choice is unnecessary, is an appropriately balanced model.”), available athttp://ftc.gov/os/2012/03/120326privacyreport.pdf.

We depict notice and choice as a straight line since, in many implementations, consumers are given solely binary decisions about whether to accept or
reject a set of services or product features. The diagrams in this Essay attempt to illustrate our thinking; they are not intended to precisely reflect any
particular privacy issue.

Secure multiparty computation has been implemented in various well-known protocols. The area traces its roots to Andrew Yao’s “garbled circuit
construction,” a piece of “crypto magic” dating to the early 1980s. Researchers have used secure multiparty computation to demonstrate privacy-preserving
designs in myriad domains—voting, electronic health systems and personal genetics, and location-based services, to name just a few. The payment model we
suggest is based on David Chaum’s “e-cash.” His company DigiCash offered essentially such a system (not just for transportation, but for all sorts of
payments) in the 1990s, but it went out of business by 1998. See generallyHow DigiCash Blew Everything, Next Mag., Jan. 1999, available athttp://cryptome.org/jya/digicrash.htm.

We have observed firsthand the difficulty imposed by organizational divides in the World Wide Web Consortium’s process to standardize Do Not Track.
Participants from the online advertising industry have largely been unable to engage on privacy substitutes owing to limited technical expertise,
distortions in information relayed to technical staff, and inability to facilitate a direct dialog between inside and outside technical experts.

Sometimes a rigorously vetted privacy substitute will be ready for deployment. Frequently, to be sure, the space of privacy substitutes will include
gaps and ambiguities. But policymakers are no strangers to decisions under uncertainty and relying on the best available science.

We caution against requiring particular technical designs. In the future, better designs may become available, or deficiencies in present designs may
be uncovered. Cast in more traditional terms of regulatory discourse, this is very much an area for targeting ends, not means.

Seesupra note 3.

The congressional response to Transportation Security Administration full-body scanners might be considered an instance of a privacy substitute.
Congress allowed the TSA to retain the scanners, but required a software update that eliminated intrusive imaging. 49 U.S.C. § 44901(l) (2011).

For example, some technology companies are lobbying for European Union law to exempt pseudonymous data from privacy protections. SeeCtr. for
Democracy & Tech., CDT Position Paper on the Treatment of Pseudonymous Data Under the Proposed Data Protection Regulation (2013), available athttps://www.cdt.org/files/pdfs/CDT-Pseudonymous-Data-DPR.pdf. Information
privacy researchers have, however, long recognized that pseudonymous data can often be linked to an individual. See, e.g., Mayer & Mitchell, supra note 6, at 415-16.

Other contributions to this discussion

How should privacy risks be weighed against big data rewards? The recent controversy over leaked documents revealing the massive scope of data collection, analysis, and use by the NSA and possibly other national security organizations has hurled to the forefront of public attention the delicate balance between privacy risks and big data opportunities. The NSA revelations crystalized privacy advocates’ concerns of “sleepwalking into a surveillance society” even as decisionmakers remain loath to curb government powers for fear of terrorist or cybersecurity attacks.

Classification is the foundation of targeting and tailoring information and experiences to individuals. Big data promises—or threatens—to bring classification to an increasing range of human activity. While many companies and government agencies foster an illusion that classification is (or should be) an area of absolute algorithmic rule—that decisions are neutral, organic, and even automatically rendered without human intervention—reality is a far messier mix of technical and human curating. Both the datasets and the algorithms reflect choices, among others, about data, connections, inferences, interpretation, and thresholds for inclusion that advance a specific purpose. Like maps that represent the physical environment in varied ways to serve different needs—mountaineering, sightseeing, or shopping—classification systems are neither neutral nor objective, but are biased toward their purposes. They reflect the explicit and implicit values of their designers. Few designers “see them as artifacts embodying moral and aesthetic choices” or recognize the powerful role they play in crafting “people’s identities, aspirations, and dignity.” But increasingly, the subjects of classification, as well as regulators, do.

Big data is all the rage. Its proponents tout the use of sophisticated analytics to mine large data sets for insight as the solution to many of our society’s problems. These big data evangelists insist that data-driven decisionmaking can now give us better predictions in areas ranging from college admissions to dating to hiring. And it might one day help us better conserve precious resources, track and cure lethal diseases, and make our lives vastly safer and more efficient. Big data is not just for corporations. Smartphones and wearable sensors enable believers in the “Quantified Self” to measure their lives in order to improve sleep, lose weight, and get fitter. And recent revelations about the National Security Agency’s efforts to collect a database of all caller records suggest that big data may hold the answer to keeping us safe from terrorism as well.

Big data is transforming individual privacy—and not in equal ways for all. We are increasingly dependent upon technologies, which in turn need our personal information in order to function. This reciprocal relationship has made it incredibly difficult for individuals to make informed decisions about what to keep private. Perhaps more important, the privacy considerations at stake will not be the same for everyone: they will vary depending upon one’s socioeconomic status. It is essential for society and particularly policymakers to recognize the different burdens placed on individuals to protect their data.

Legal debates over the “big data” revolution currently focus on the risks of inclusion: the privacy and civil liberties consequences of being swept up in big data’s net. This Essay takes a different approach, focusing on the risks of exclusion: the threats big data poses to those whom it overlooks. Billions of people worldwide remain on big data’s periphery. Their information is not regularly collected or analyzed, because they do not routinely engage in activities that big data is designed to capture. Consequently, their preferences and needs risk being routinely ignored when governments and private industry use big data and advanced analytics to shape public policy and the marketplace. Because big data poses a unique threat to equality, not just privacy, this Essay argues that a new “data antisubordination” doctrine may be needed.

Big data’s big utopia was personified towards the end of 2012. Our concern is about big data’s power to enable a dangerous new philosophy of preemption. In this Essay, we focus on the social impact of what we call “preemptive predictions.” Our concern is that big data’s promise of increased efficiency, reliability, utility, profit, and pleasure might be seen as the justification for a fundamental jurisprudential shift from our current ex post facto system of penalties and punishments to ex antepreventative measures that are increasingly being adopted across various sectors of society. It is our contention that big data’s predictive benefits belie an important insight historically represented in the presumption of innocence and associated privacy and due process values—namely, that there is wisdom in setting boundaries around the kinds of assumptions that can and cannot be made about people.

“Big Data” has attracted considerable public attention of late, garnering press coverage both optimistic and dystopian in tone. Some of the stories we tell about big data treat it as a computational panacea—a key to unlock the mysteries of the human genome, to crunch away the problems of urban living, or to elucidate hidden patterns underlying our friendships and cultural preferences. Others describe big data as an invasive apparatus through which governments keep close tabs on citizens, while corporations compile detailed dossiers about what we purchase and consume. Like so many technological advances before it, our stories about big data generate it as a two-headed creature, the source of both tremendous promise and disquieting surveillance. In reality, like any complicated social phenomenon, big data is both of these, a set of heterogeneous resources and practices deployed in multiple ways toward diverse ends.

“Big data” can be defined as a problem-solving philosophy that leverages massive datasets and algorithmic analysis to extract “hidden information and surprising correlations.” Not only does big data pose a threat to traditional notions of privacy, but it also compromises socially shared information. This point remains underappreciated because our so-called public disclosures are not nearly as public as courts and policymakers have argued—at least, not yet. That is subject to change once big data becomes user friendly.

There are only a handful of reasons to study someone very closely. If you spot a tennis rival filming your practice, you can be reasonably sure that she is studying up on your style of play. Miss too many backhands and guess what you will encounter come match time. But not all careful scrutiny is about taking advantage. Doctors study patients to treat them. Good teachers follow students to see if they are learning. Social scientists study behavior in order to understand and improve the quality of human life.

De-identification is a process used to prevent a person’s identity from being connected with information. Organizations de-identify data for a range of reasons. Companies may have promised “anonymity” to individuals before collecting their personal information, data protection laws may restrict the sharing of personal data, and, perhaps most importantly, companies de-identify data to mitigate privacy threats from improper internal access or from an external data breach. This Essay attempts to frame the conversation around de-identification.