Frankly, the overwhelming majority of academics have ignored the data explosion caused by the digital age. The world’s most famous sex researchers stick with the tried and true. They ask a few hundred subjects about their desires; they don’t ask sites like PornHub for their data. The world’s most famous linguists analyze individual texts; they largely ignore the patterns revealed in billions of books. The methodologies taught to graduate students in psychology, political science, and sociology have been, for the most part, untouched by the digital revolution. The broad, mostly unexplored terrain opened by the data explosion has been left to a small number of forward-thinking professors, rebellious grad students, and hobbyists. That will change.

(...)

Everybody lies. People lie about how many drinks they had on the way home. They lie about how often they go to the gym, how much those new shoes cost, whether they read that book. They call in sick when they’re not. They say they’ll be in touch when they won’t. They say it’s not about you when it is. They say they love you when they don’t. They say they’re happy while in the dumps. They say they like women when they really like men. People lie to friends. They lie to bosses. They lie to kids. They lie to parents. They lie to doctors. They lie to husbands. They lie to wives. They lie to themselves. And they damn sure lie to surveys. Here’s my brief survey for you:

Have you ever cheated in an exam?

Have you ever fantasised about killing someone?

Were you tempted to lie?

Many people underreport embarrassing behaviours and thoughts on surveys. They want to look good, even though most surveys are anonymous. This is called social desirability bias.

An important paper in 1950 provided powerful evidence of how surveys can fall victim to such bias. Researchers collected data, from official sources, on the residents of Denver: what percentage of them voted, gave to charity, and owned a library card. They then surveyed the residents to see if the percentages would match. The results were, at the time, shocking. What the residents reported to the surveys was very different from the data the researchers had gathered. Even though nobody gave their names, people, in large numbers, exaggerated their voter registration status, voting behaviour, and charitable giving.

Has anything changed in 65 years? In the age of the internet, not owning a library card is no longer embarrassing. But, while what’s embarrassing or desirable may have changed, people’s tendency to deceive pollsters remains strong. A recent survey asked University of Maryland graduates various questions about their college experience. The answers were compared with official records. People consistently gave wrong information, in ways that made them look good. Fewer than 2% reported that they graduated with lower than a 2.5 GPA (grade point average). In reality, about 11% did. And 44% said they had donated to the university in the past year. In reality, about 28% did.

Then there’s that odd habit we sometimes have of lying to ourselves. Lying to oneself may explain why so many people say they are above average. How big is this problem? More than 40% of one company’s engineers said they are in the top 5%. More than 90% of college professors say they do above-average work. One-quarter of high school seniors think they are in the top 1% in their ability to get along with other people. If you are deluding yourself, you can’t be honest in a survey.

The more impersonal the conditions, the more honest people will be. For eliciting truthful answers, internet surveys are better than phone surveys, which are better than in-person surveys. People will admit more if they are alone than if others are in the room with them. However, on sensitive topics, every survey method will elicit substantial misreporting. People have no incentive to tell surveys the truth.

How, therefore, can we learn what our fellow humans are really thinking and doing? Big data. Certain online sources get people to admit things they would not admit anywhere else. They serve as a digital truth serum. Think of Google searches. Remember the conditions that make people more honest. Online? Check. Alone? Check. No person administering a survey? Check.

The power in Google data is that people tell the giant search engine things they might not tell anyone else. Google was invented so that people could learn about the world, not so researchers could learn about people, but it turns out the trails we leave as we seek knowledge on the internet are tremendously revealing.

I have spent the past four years analysing anonymous Google data. The revelations have kept coming. Mental illness, human sexuality, abortion, religion, health. Not exactly small topics, and this dataset, which didn’t exist a couple of decades ago, offered surprising new perspectives on all of them. I am now convinced that Google searches are the most important dataset ever collected on the human psyche.

This is not a blog.

It's a personal challenge.

I've tried to write a blog since I heard about blogs. I started one about art, one about random thoughts, one with funny links... Nothing lasted more than a couple of weeks. Until I discovered Posterous. I thought it was genius... and then it shut down and I only noticed it when it was too late to recover anything. Now I've fallen in love with Squarespace, and I've decided to try again, hoping neither my discipline nor technology will give up this time.