Big Brother’s Blind Spot

Mining the failures of surveillance tech

Netflix believes, algorithmically at least, that I am the kind of person who likes to watch “Dark TV Shows Featuring a Strong Female Lead.” This picksome genre is never one that I that seek out intentionally, and I’m not sure it even represents my viewing habits. (Maybe I fell asleep watching The Killing one night?) It is an image of me that Netflix compiled from personal data it gathers, and, like a portrait taken slantwise and at a distance, much finer detail is missing. As it happens, television sometimes puts me to sleep; other times I stream a movie as I work on my laptop, and by the time I’ve finished typing and look back, the credits are rolling. Either way, the idea offered of me after my data has been mined is curiously off-base.

More than a decade ago, Netflix ushered in a cultural conversation about big data and algorithms with stunts like the Netflix Prize—an open competition to improve user rating predictions—and its eventual use of subscriber data to produce and cast the show House of Cards. Now, with Cambridge Analytica and driverless cars in the headlines, the artless future that some technology critics forecasted back then—movies cast by algorithms!—sounds quaint in comparison. For the time being, the stakes are low (rifle through streaming titles to find something good to watch), and the service declares the way it categorizes me—as a fan of the “Strong Female Lead”—rather than clandestinely populating the interface with lady detective shows. To be sure, there is plenty to criticize about its micro-targeting practices, but now that “surveillance capitalism” has eclipsed “big data” as the tech media buzzphrase of choice, at least its subscriber-based business model suggests the company has little incentive to partner with data brokers like Acxiom and Experian, to determine whether mine is a BoJack Horseman household or more apt to stream 13 Reasons Why.

Netflix is an accessible example of the gap between an algorithmically generated consumer profile and the untidy bundle of our lived experiences and preferences. The reality of living a digital life is that we’re routinely confronted with similarly less than spot-on categories: Facebook ads for products you would never buy, iPhoto tagging your house as a person’s face, false positives, false negatives, and all the outliers that might be marked as red dots on prediction models. Mix-ups like these might be laughable or bothersome; the octopus of interlinked corporate and state surveillance apparatuses has inevitable blind spots, after all. Still, I wonder if these blunders are better than the alternative: perfect, all-knowing, firing-on-all-cylinders systems of user tracking and categorization. Perhaps these mistakes are default countermeasures: Can we, as users, take shelter in the gaps of inefficacy and misclassification? Is a failed category to the benefit of the user—is it privacy, by accident?

Surveillance is “Orwellian when accurate, Kafkaesque when inaccurate,” Privacy International’s Frederike Kaltheuner told me. These systems are probabilistic, and “by definition, get things wrong sometimes,“ Kaltheuner elaborated. “There is no 100 percent. Definitely not when it comes to subjective things.” As a target of surveillance and data collection, whether you are a Winston Smith or Josef K is a matter of spectrum and a dual-condition: depending on the tool, you’re either tilting one way or both, not in the least because even data recorded with precision can get gummed up in automated clusters and categories. In other words, even when the tech works, the data gathered can be opaque and prone to misinterpretation.

Companies generally don’t flaunt their imperfection, but nearly every internet user has a story about being inaccurately tagged or categorized in an absurd and irrelevant way.

Companies generally don’t flaunt their imperfection—especially those with Orwellian services under contract—but nearly every internet user has a story about being inaccurately tagged or categorized in an absurd and irrelevant way. Kaltheuner told me she once received an advertisement from the UK government “encouraging me not to join ISIS,” after she watched hijab videos on YouTube. The ad was bigoted, and its execution was bumbling; still, to focus on the wide net cast is to sidestep the pressing issue: the UK government has no business judging a user’s YouTube history. Ethical debates about artificial intelligence tend to focus on the “micro level,” Kaltheuner said. When “sometimes the broader question is, do we want to use this in the first place?”

Mask Off

This is precisely the question taken up by software developer Nabil Hassein in “Against Black Inclusion in Facial Recognition,” an essay he wrote last year for the blog Decolonized Tech. Making a case both strategic and political, Hassein argues that technology under police control never benefits black communities and voluntary participation in these systems will backfire. Facial recognition commonly fails to detect black faces, in an example of what Hassein calls “technological bias.” Rather than working to resolve this bias, Hassein writes, we should “demand instead that police be forbidden to use such unreliable surveillance technologies.”

Hassein’s essay is in part a response to Joy Buolamwini’s influential work as founder of the Algorithmic Justice League. Buolamwini, who is also a researcher at MIT Media Lab, is concerned with the glaring racial bias expressed in computer vision training data. The open source facial recognition corpus largely comprises white faces, so the computation in practice interprets aspects of whiteness as a “face.” In a TED Talk about her project, Buolamwini, a black woman, demonstrates the consequences of this bias in real time. It is alarming to watch as the digital triangles of facial recognition software begin to scan and register her countenance on the screen only after she puts on a white mask. For his part, Hassein empathized with Buolamwini in his response, adding that “modern technology has rendered literal Frantz Fanon’s metaphor of ‘Black Skin, White Masks.’” Still, he disagrees with the broader political objective. “I have no reason to support the development or deployment of technology which makes it easier for the state to recognize and surveil members of my community. Just the opposite: by refusing to don white masks, we may be able to gain some temporary advantages by partially obscuring ourselves from the eyes of the white supremacist state.”

I asked Hassein about the responses he received after publishing the piece. One might be tempted to counter his argument with examples of those who are unfairly arrested because of face detection failure—the Josef Ks of the modern world. But for Hassein, a prison abolitionist, “there’s no right person.” But what of non-carceral or non-criminal cases—what if, say, driverless cars begin attacking black pedestrians at uncommonly high rates as a consequence of this technology bias? It is not possible to “actually weight a utilitarian case,” he told me. Surveillance technology is “deliberately so obscured and mystified and controlled by so few people, it makes it very difficult for people to imagine things being a different way.” Without knowing why this technology is deployed, or how it works, or who controls it, why would anyone presume it challenges existing power structures?

In addition to the Algorithmic Justice League, companies like Gfycat are also working to address racial bias in facial recognition training data. As Wiredreported, a software engineer at the GIF-hosting start-up noticed the open source software it relied on was particularly terrible at identifying Asians—it failed to distinguish Constance Wu from Lucy Liu. To prevent it from identifying “every Asian person as Jackie Chan,” the engineer created an “Asian-detector” for the scanning process; it now classifies images of faces as Asian before setting a threshold to find a match. This was a specific fix for a problem diminishing the usability of the company’s product. Another solution to the problem of tech bias, from another angle, might be to scan fewer white faces. But the question remains: should this technology exist in the first place?

Confirmation: Bias

It is surely important to document, study, and address the biases of technology and algorithms, but some context is necessary if user privacy is the prevailing goal.

The queer community, people of color, and other marginalized groups have always been the targets of surveillance apparatuses. The modern surveillance state in America traces back at least to the U.S. occupation of the Philippines, during which new inventions like photography and telegraphy were used to amass data on Filipino leaders and civilians—including information about their properties, personal networks, and finances—to scan for signs of dissent. And we could perhaps date it even earlier: Simone Browne, author of Dark Matters: On the Surveillance of Blackness, relates eighteenth-century “lantern” laws in New York City, which demanded black persons carry lanterns if they walked unaccompanied by white persons after dark, to surveillance and stop-and-frisk practices today. During much of his near-half-century stint at the FBI, J. Edgar Hoover targeted African-Americans wholesale, including those without activist or left-liberal affiliations; and, at the time of the “Lavender Scare,” the FBI gathered files on queer persons to facilitate their dismissal from jobs in the federal work force. In the late-twentieth and twenty-first centuries, Muslim and Arab communities in America have been aggressively surveilled by the NYPD, FBI, and other intelligence-gathering bodies, as have black and brown communities that continue to be disproportionately targeted.

Today, diversity is a much-championed agenda item, and eliminating bias is a tempting proposition for some technologists. I imagine part of the appeal lies in its simplicity; addressing “bias” has a concrete aim: to become unbiased. With “bias” as a prompt, the possibility of a solution begins to sound unambiguous, even if the course of action—or its aftereffects—is dubious. “Bias” can therefore obscure structural and systematic forces, while attempts to resolve it can lead to unintended consequences. After speaking with Hassein, I looked over a number of articles about “gender bias” and “sexism” in voice recognition software; it turns out that Google and other AI product developers are less effective at capturing the higher pitch of feminine voices. As a writer who is overwhelmed with transcription work, automated error-free speech-to-text dictation sounds appealing; but as a human being in a complicated and opportunistic world, I don’t know that I want Google to improve at recognizing my voice. What will it do with the data? That I don’t feel particularly shortchanged by voice recognition technology isn’t to deny structural gender inequality in the tech industry. Rather, my parallel fear—abuse of my data—has to do with my limited consent in these matters. The importance of Hassein’s argument is that he’s highlighted how users cannot opt out of these systems, whether Orwellian, Kafkaesque, or in-between. The history of surveillance technologies only strengthens this rationale.

White, Adult, and Non-Cartoon

If we were to tally up the pros and cons of black inclusion in facial recognition training data, a bullet point for the left column might be that the community was spared from Michal Kosinski and Yilun Wang’s research. The pair included only white people among their (nonconsensual) subjects in an experiment that relied on machine learning to divine sexual orientation from nothing more than the digital image of a user’s face. Studies like this one aren’t designed to go unnoticed. When Kosinksi and Wang, researchers at Stanford University, released a pre-printed version of their Journal of Personality and Social Psychology paper last September, they were met with exactly the volume and tenor of outcry their research was designed to bait.

Kosinski and Wang’s initial step was to collect publicly available images of both men and women from dating websites; next they established these users’ sexual orientations from the preferences they reported in their profiles. From there they turned to Amazon’s Mechanical Turk, a digital sweatshop that relies on underpaid “human intelligence,” to clean the data: only white, complete, adult, non-cartoon faces of either men or women, please. The experiment was conducted using a pair of images; the AI program was trained to decide which of the two users was more likely to be gay, and its results showed that the program was more “successful” than the Mechanical Turk workers given the same pair of images. They scanned Facebook profiles for men who liked things like “I love being Gay,” “Manhunt,” “Gay and Fabulous,” and “Gay Times Magazine” and achieved similar results with their images. This pernicious charade was followed by a less successful experiment in which the AI was made to select one hundred gay individuals from a sample of one thousand randomly drawn images: forty-seven of seventy guesses were correct, and fifty-three of the individuals were false positives. If it isn’t already obvious, this Hot or Not-style arrangement has no basis in the real world, where gay men and women (the experiment ignored bisexuals and pansexuals) hardly make up half of the general population. The research demonstrates merely that machine learning might be used to pick up patterns from these two classifications.

I suppose that if you believe human beings are so perilously facile, uncomplicated, and legible as to have their sexual orientations gleaned from images of their faces, it’s no great leap to claim that privacy is now buried and decomposing.

Greggor Mattson, a sociology professor at Oberlin, told me that if Kosinski and Wang had stopped there—that the algorithm might detect two populations and one group might be more likely to be gay or lesbian—he never would have responded. Possibly no one would have, not least because such modestly stated findings are hardly grist for the tech press, let alone the Daily Mail’s comments section. Instead the researchers doubled down in areas well outside their areas of expertise—biology and queer theory—extrapolating wildly from their data to claim that sexual orientation is the product of prenatal hormones. Face shapes, they determine, reflect fetal androgen exposure. Mattson wrote a thorough response criticizing the study’s “confusion of sex, gender, sexuality, and cultural practices,” and its failure to distinguish between “same-sex identity, desires, and behavior.” Sexual orientation is not a fixed identity. The face “isn’t where sexuality is located. It doesn’t leave physical markers on the body,” Mattson told me, but qualities like facial expressions, facial hair and makeup, glasses, choice of lighting, and other elements associated with how someone poses for a profile image—all of these might have something to do with the results.

The key to understanding why Kosinski and Wang’s experiment failed can be found in a note buried within their paper’s methodology section: “unfortunately, we were not able to reliably identify heterosexual Facebook users.” (Evidently no one liked “Straight and Unfabulous.”) They plainly assumed that heterosexual identity is expansive and varied, whereas homosexuality is compressed and formulaic. People understand “there are a lot of different styles of being a heterosexual,” Mattson said, pointing to how much race, ethnicity, age, and region matter as factors. With this glaring “methodological” confession, Kosinski and Wang revealed they were in over their heads from the first steps of their research. A sample illustration of the instructions given to the Mechanical Turks who cleaned the data showed an image of Barack Obama marked by Kosinski and Wang as a “wrong non-Caucasian face Black.” Yet another image of a man was marked “wrong” because he appeared to be “Latino.” (In both examples, the images were “wrong” because of the study’s white-only emphasis.) Mattson, in his response, noticed that already two images “muddle conceptual categories.” Barack Obama is biracial “but simply ‘Black’ by American cultural norms,” and “Latino is an ethnic category, not a racial one: many Latinos already are Caucasian.”

If technology is routinely legitimized by delusions about its impartiality and misplaced faith in its precision, perhaps a wider public acknowledgment of its capacity to fail might slow its unrelenting advance.

Kosinski, the primary spokesman for the paper, insists that his intention is to warn society about the imminent dangers of machine learning. In interviews, he explains that he did not wish AI “gaydar” into existence; rather, the technology already exists, and his research alerts gay people to “oppressive regimes” that might use it to profile citizens. This concern-trolling pose—“I’m just the messenger”—is patronizing and tone-deaf. The queer community, to begin with, already tries to monitor this oppressiveness. “We’ve always known that we are targets,” Mattson explained, telling me that for several years a variation of a statement “you don’t have my permission to use any of this for your research” has commonly appeared in the dating profiles of gay men. Regardless of its legal bearing, this disclaimer serves as an advisory within a community about the vulnerability of these platforms and the possible incursions into privacy they portend. “A lot of us are just one screenshot away,” Mattson said, “from our boss calling us in and having a really awkward conversation or [our] being terminated.”

Worstward, No

This specious methodology and its deleterious prospects reach beyond a single study. It turns out that Kosinski’s earlier research on Facebook engagement at the Psychometrics Centre of Cambridge University helped spawn Cambridge Analytica’s now infamous intervention in the 2016 election—although Kosinski himself was not a part of the company. What Kosinski shares with Cambridge Analytica is the habit of making provocative and inflated claims that serve as a smokescreen for rather inconclusive research. When you find headlines in the tech press of the style “Facebook knows more about you than your spouse,” or “social media knows you better than your best friends,” it is likely Kosinski is cited (and that the word “knows” is punching above its paygrade). Recently, and with discouraging irony, he gave a talk titled “The End of Privacy.” But I suppose that if you believe human beings are so perilously facile, uncomplicated, and legible as to have their sexual orientations gleaned from images of their faces, it’s no great leap to claim that privacy is now buried and decomposing.

Also consider that Kosinski and Wang’s study was conducted a short drive from the Palo Alto office building that was once home to Theranos, known for its false claims about blood tests, and a Caltrain ride from the former headquarters of Juicero, a company that made the squeezing of fruit into a post-human machine task. Their research is another expression of Silicon Valley’s fake-it-til-you-make-it culture of denial and opportunism. Yet the outsize hype attached to these projects breeds problems of its own. Regardless of machine-learning “gaydar’s” efficacy, James Vincent wrote at The Verge, “if people believe AI can be used to determine sexual preference, they will use it.” It follows that we’re better served by drawing attention to the error rates and uncertainties of this research than by obscuring its flaws and echoing the bluster of its pitchmen. In the end, Kosinski got it wrong: if he wanted to help the queer community fight “oppressive regimes,” he might have led the publicity drive by trumpeting his paper’s shortcomings. We know from past hokum, like the Myers-Briggs Type Indicator, handwriting analysis, and the polygraph, that the public and private sectors will bet on coin-toss odds. This technology is attractive, despite its failures, because it offers an illusion of standardization and objectivity about that which is conditional and even subjective.

If technology is routinely legitimized by delusions about its impartiality and misplaced faith in its precision, perhaps a wider public acknowledgment of its capacity to fail might slow its unrelenting advance. The failures of surveillance and classification technologies, frustrating as they might be in the moment—especially for their investors—cast doubt on the powerful, the knowledgeable, and the expert. These mistakes demonstrate that systems do not work as intended. And these founderings might also give way to conversations that take place beyond the noisy gibberish of marketing language. Considering its false advertisement, how is this technology used? What restrictions should be placed on it? Should this technology exist? At the risk of creating my own blunt “Strong Female Lead”-style categorization: I think a technology should not exist if there is no procedure to contest and amend its inevitable mistakes. And that’s just to start.

Given the choice, I think many of us—as individuals—might prefer the Kafkaesque to the Orwellian (at least until chapter ten of The Trial). Surveillance companies, however, have every incentive to sell Big Brother to governments and businesses, no matter how far their products fall short, because those customers tend to understand “Orwellian” as a good thing. The question then, is how to harness and extend the ambiguous and unmapped territory so that we don’t find ourselves subjects of either dystopia.

Joanne McNeil writes about privacy and Internet culture. She is currently a resident at Eyebeam in New York.