Wednesday, July 30, 2014: Maximum Shelf: Dataclysm: Who We Are (When We Think No One's Looking)

Dataclysm: Who We Are (When We Think No One's Looking)

by Christian Rudder

How prejudiced is today's society? What does Facebook predict about the stability of a marriage? Where and why are gay people staying in the closet? How do political views affect romantic relationships?

Christian Rudder delved deep into the "statistical slag pits" and emerged with a bold, thought-provoking book that answers these questions and more. In Dataclysm: Who We Are (When We Think No One's Looking), he shows how technology is offering an "unprecedented sociological opportunity" and helping to transform our understanding of race, politics, sex, beauty, humor, anger and other subjects previously challenging to quantify.

We often hear about "Big Data," or large stores of information, discussed in the context of how it might be used to entice people to purchase products they don't need or to spy on us in the name of national security. But Rudder's aim is to better understand human nature and behavior. Some 87% of the United States' population is online--working, socializing, romancing. With every click, post, Tweet and web search, "our hidden thoughts are becoming part of the world. With a little creative typing, a few workarounds, and some math, we are giving humanity's inner monologue a wider audience."

Rudder's role as Virgil through the digital world has been 10 years in the making. He is a co-founder of OkCupid, one of the largest dating websites in the world, and chief analyst of the vast repository of data the company has amassed since it launched a decade ago. As more and more information was collected from the site's millions of users, trends and patterns began to emerge.

Rudder realized that this deep, varied data set of person-to-person interaction could be used to directly examine taboos like race. "I could go and look at what actually happens when, say, 100,000 white men and 100,000 black women interact in private. The data was sitting right there on our servers," he says. Unlike surveys, in which respondents can edit their answers or even outright lie, he had the unvarnished truth at his fingertips.

In Dataclysm, Rudder combines existing work with his own original research, analyzing information from OkCupid, Google, Twitter, Facebook, Reddit, Tumblr and other websites. He reveals his findings in a series of vignettes organized into three main categories: the data of people connecting, the data of division and the data of the individual.

Rudder begins by putting hard numbers to the timeless mystery of sex appeal and what brings two people together in the first blush of attraction. (Surprising find: embrace your flaws.) From there he moves on to other topics, like written communication, demonstrating that Twitter is actually improving its users' writing ability and changing the study of language.

Next, Rudder probes society's great divides, exploring charged issues like faith, politics and race. As he discovered, the unvarnished truth isn't always pleasant. Although expressing racist views publicly is no longer considered socially acceptable, digital activity proves that in private, racism is pervasive and still an implicit factor in people's decision making.

Rudder also uses his findings to give strength and nuance to previous work and suggests ways to build on it. For example, it's not news that looks matter, particularly for women, as Naomi Wolf put forth in the bestseller The Beauty Myth. But the atomized actions of millions of online participants means anecdotes are now bolstered by evidence. From the dating world to the workplace, Rudder illustrates how, "not unlike race, beauty is a card you're dealt, and it has huge repercussions."

In Dataclysm's third section, Rudder turns his attention to the individual, exploring how ethnic, sexual and political identity is expressed. He reveals how whites, blacks, Asians and Latinos are most and least likely to define themselves; how location shapes a person; and why data surrounding self-reported gay populations across the country has a sobering meaning.

Rudder is even-handed in exemplifying both the good and the bad taking place on the Internet, "a vibrant, brutal, loving, forgiving, deceitful, sensual, angry place" that reflects its users. Tumblr is reaching out to help those with eating disorders, while virtual lynch mobs have formed on Twitter, inciting collaborative rage with far-reaching effects.

Dataclysm covers broad territory, ranging from interesting curiosities, like which state's residents bathe the most frequently, and what men and women are most eager to know about the opposite sex, to subjects with larger social ramifications. Google leads the way in using data for public good, including its flu tracker, which utilizes searches for remedies and symptoms to pinpoint outbreaks and alert the CDC, and Constitute, a database of hundreds of documents emerging nations can use as a guide in designing their own constitutions.

A book based on statistics could easily be dry and boring, but not with Rudder at the helm. If numbers are the narrative, he is the consummate storyteller--smart, witty and a perceptive interpreter of the data. His pithy, conversational tone and fast-paced writing style make Dataclysm both amusing and informative. Charts and graphs appear throughout the book, each one explained in clear, colorful detail, along with pop-culture references and entertaining personal anecdotes.

The book's title is drawn from Kataklysmos, Greek for the Old Testament Flood and the origin of the English word "cataclysm," and was chosen partly in reference to the unprecedented deluge of data being collected today. Rudder concludes by ruminating on some of the challenges the data deluge is bringing with it, chiefly privacy concerns, and where we're headed from here. With every click, the floodgates will open further, strengthening the reach and power of Big Data.

"More than stretching out my arms to say This is the pinnacle, I mean to communicate the power of what's to come," says Rudder. "The cliché would be to say that this is just the tip of the iceberg, but we're not even at sea yet. In the dataclysm, the water's hardly up to our knees." Dataclysm is a valuable read for anyone who would like to know what's going on behind the scenes in cyberspace. --Shannon McKenna Schmidt

Crown,
$28,
hardcover, 9780385347372,
September 9, 2014

Dataclysm: Who We Are (When We Think No One's Looking)
by
Christian Rudder

Share with friends:

Permanent Link:

Christian Rudder: "We Can Know Our Hearts Through Data"

photo: Victor G. Jeffreys II

Christian Rudder is cofounder and president of OkCupid and author of the popular blog OkTrends. He graduated from Harvard in 1998 with a degree in math and later served as creative director for SparkNotes. He has appeared on NBC's Dateline and NPR's All Things Considered, and his work has been written about in the New York Times, the New Yorker and other publications. He lives in Brooklyn, N.Y., with his wife and daughter.

Why is the data collected through websites like OkCupid, Facebook and Google more revealing than surveys or academic behavioral research? Why is the key how people behave when they think no one is watching?

These sites, for all their ruthless efficiency as businesses, are also social science field experiments of unprecedented size. And doing field experiments one better, the research is entirely open-ended, and the subjects don't even know they're taking part. All activity, along all vectors (sex, race, politics, etc.), is captured simultaneously.

As proof of the data's value, academics are gravitating to it. Facebook and Google both have created their own in-house research teams. I have professors and PhDs contacting me at OkCupid about our data at least once a week. Every behavioral science department in the country must have someone mining Twitter.

As for the "no one is watching" thing, that advantage is best laid out by considering the opposite extreme. Reality television is people at their most watched, and least realistic. Without an explicit observer, you can hope to get the person. Otherwise, you risk getting a performance instead.

Tell us about the process of shaping and writing Dataclysm. Given the wealth of information you had available, how did you narrow it down and decide what to include in the book?

Focusing the data and my arguments was difficult. The book is essentially about the "stuff people do," which as any Seinfeld fan will tell you, isn't really a topic in the traditional sense. In order for it to cohere, I left out a lot of interesting, but tangential, stories. What made it into the book was the data and the research that best exemplified my thesis: that we can know our hearts through data.

Despite hand-wringing by some about the effect of technology on our culture, you see the Internet as "a writer's world." What does the data show about the Internet and written communication?

In short, the data shows that written communication is adapting rather than going extinct. People are undoubtedly writing more than ever, and Twitter, for example, might've created new linguistic conventions, but the writing there isn't objectively worse. In fact, the average length of a word in a Tweet is longer than in other types of writing (even magazine-level journalism). This despite the notorious 140-character limit: less space requires that the writer use every character for meaning. What seems like a constraint doubles as an inspiration. People make do. Language, too.

Dataclysm is filled with fascinating insights on topics ranging from romance to race to politics. What did you uncover that most surprised you?

I get this question all the time, and I've yet to come up with a snappy answer. I wasn't really going for the counterintuitive Holy Grail mindfreak that many similar books have chased. A lot of Dataclysm confirms or simply adds depth to what we already suspect about human nature. But it does so with unique clarity and visibility. My wife and I go scuba-diving a couple of times a year, and I'll put it like this: when you go underwater, see the fish, the shells, the coral, even the sand, running it through your fingers at the bottom of the sea, it's all very beautiful, even moving. But I wouldn't say it's at all surprising--even sitting up on the boat, you know what's down there. The surprising thing, and I've often thought this to myself floating next to some fish, is that you're able to see any of it for yourself; swimming, watching, sifting, where people aren't supposed to be.

What would you like readers to take away from Dataclysm?

A sense of a world and a discipline that is only now unfolding. The data science I discuss (and practice) in the book is only a few years old. The book is much less a retrospective than a look forward. There are many holes left to fill and many questions I can only pose: answers aren't available yet. I wanted readers to feel the spaces and the unknown and the horizon, and wonder at the frontier. --Shannon McKenna Schmidt