[This post incorporates parts of posts from posts on my own blog and lecture notes I circulate to my graduate students. I figured it was worth revising and posting here as a) basically none of you are my grad students or read my blog and b) I want to get Jim Manzi's opinion on it as long as I have him as a co-blogger.]

Sampling error? Omitted variable bias? Bah, that's for first-year grad students. What I find really interesting is there are some fairly basic principles for how analysis can get really screwy but which can't be fixed by adding more control variables, increasing your sample size, or fiddling with assumptions about the distribution of the dependent variable. I'm thinking about really scary sources of model specification problems. Or actually, not model specification in of itself, but data collection. Your typical social science graduate curriculum talks a lot about getting standard error right but on a day to day basis most of our work goes into getting the data into the proper form and this is also where most problems come from.

But before talking math, let's contemplate a recent overheard confession that, "Turns out those funny looking toe shoes are pretty comfortable." As someone who feels naked without footwear that involves both socks and laces I had never given much thought to this and to the extent that I had, I assumed wearing these things was a costly signal of geekiness. But on reflection it makes perfect sense. After all if something as ridiculous looking as toe shoes were not comfortable then nobody would wear them. Conversely, four inch heels are very uncomfortable (or so I am given to understand) but many women wear them because they're attractive. So we can imagine a negative association between how attractive shoes are and how good they feel. Indeed, this describes my own collection of incredibly comfortable but informal Chucks, fairly comfortable and decent-looking dress shoes, and a second pair of dress shoes that are uncomfortable but fancy. One interpretation of this (and bear with me as I briefly sound like a critical studies type person) would be something along the lines of a sadistic gaze wherein the perceived attractiveness of a shoe is directly derived from the discomfort we imagine it imposing on its wearer. I don't doubt that people have made this argument but I don't buy it as a general argument because I can imagine shoes that are both hideous and uncomfortable --- say Crocs made of gravel and epoxy. There is no ontological reason why we can't have shoes that are both hideous and uncomfortable but rather there is a practical reason in that nobody wears shoes that are terrible in every way and so such shoes don't make it unto the market. That is, there is a big difference between the covariance of traits for all conceivable shoes versus covariance of traits among those shoes that actually get bought and worn.

Now here's where we get to the math. The logician, computer scientist, and fellow UCLA faculty Judea Pearl uses a graph theoretic approach to logic that emphasizes using counter-factual understandings to get at the underlying structure of causation. (His magnum opus is Causality. For an introduction relevant to the social sciences see Morgan and Winship.) One of Pearl's most interesting deductions is the idea of conditioning on a collider. If a case being observed is a function of two variables then this will induce an artifactual negative correlation between the variables. This is true even if in the broader population there is no correlation (or even a mild positive correlation) between the variables.

For instance, suppose that in a population of aspiring Hollywood actors there is no correlation between acting ability and physical attractiveness. However assume that we generally pay a lot more attention to celebrities than to some kid who is waiting tables while going on auditions. That is, we can not readily observe people who aspire to be actors, but only those who actually are actors. This implies that we need to understand the selection process by which people get cast into films. In the computer simulation displayed below I generated a population of aspiring actors characterized by "body" and "mind," each of which follows a normal distribution and with these two traits being completely orthogonal to one another. Then imagine that casting directors jointly maximize talent and looks so only the aspiring actors with the highest sum for these two traits actually get work in Hollywood. I have drawn the working actors as triangles and the failed aspirants as hollow circles. Among those actors we can readily observe there then will be a negative correlation between looks and talent, even though there is no such correlation in the grand population. If we see only the working actors without understanding the censorship process we might think that there is some stupefaction of being ridiculously good-looking.

This also applies when one or both of the variables is categorical. Many prestigious colleges have policies of preferring legacy applicants. This implies that the SAT scores of legacies are lower in the freshmen class even though they are higher in the applicant pool.

In these examples the censorship bias implied by conditioning on a collider is fairly easy to see because we have started from the latent population (aspiring actors, college applicants) and worked our way to the observed population (working actors, college freshmen). However the insidious thing about conditioning on a collider is that we almost always only see the observed population. This makes it easy to confuse what is actually a causal process of truncation with a more direct structure of causation, such as an idea that being attractive or a legacy somehow causes someone to be untalented or unintelligent.

Conditioning on a collider can occur any time that there is an underlying selection regime that involves either variables in the dataset or correlates of variables in the dataset. This is almost inevitable if you have built a composite dataset out of multiple constituent datasets. That is, a case appears in the sample if it meets one or more sampling criteria. This is actually a fairly common sample design, usually premised on the idea of not wanting to "miss anything" and/or wanting to increase the sample size.

Once you start looking for it you see it in a lot of studies. For instance, suppose a researcher were interested in which firms had donated to a particular PAC. The researcher might start with a basic sample like the Fortune 500 but then notice only 5 firms had donated to the PAC. Because statistical power in analysis of a binary variable is a function of both the number of cases (higher is better) and the proportion (close to .5 is better), the analysis would have minimal statistical power. The researcher might then add to the data all firms that donated to the PAC, regardless of whether or not they were in the 500. If the researcher were then to do a logistic regression of donating to the PAC as a function of annual revenues the results would almost inevitably be a strong negative effect. The reason is that inclusion in the sample is defined by high revenues (which is the inclusion criteria for the Fortune 500) OR donating to the PAC. There are firms with low revenues that didn't donate to the PAC, lots of them in fact, but they don't appear in the dataset.

We can see this at work in survey data. I took the 2010 wave of the General Social Survey and pulled all 395 Republicans and GOP-leaning independents (PARTYID==4/6). For these people I compared their attitudes on marijuana (GRASS) and government redistribution of wealth (EQWLTH, which I cut to a binary with responses 1/4). Among Republicans who oppose wealth distribution, 37% favor legalizing marijuana, as opposed to 38% among those who favor wealth redistribution. This difference of one percentage point is not even remotely statistically significant (chi2 0.08, 1 df).

OK, now wait a minute you may be saying, he promised us negative relationships but this is no trend at all. True, but let's contrast it with the same analysis for the whole sample, regardless of party. In general, 42% of those who oppose redistribution favor legalized marijuana against 53% of those who favor redistribution. This relationship is strongly statistically significant (chi2 14.50, 1 df). So among the general population there is a positive association between marijuana legalization and wealth redistribution. Among Republicans this effect is perfectly counterbalanced by conditioning on a collider. People presumably join the GOP because they agree with it on at least some issues. Republicans who oppose both weed and redistribution we can call movement conservatives, those who oppose weed but favor redistribution we can call social conservative populists, those who favor weed but oppose redistribution we can call libertarians, and those who favor both we can call people who should probably change their party registration. This case illustrates how conditioning on a collider doesn't necessarily result in a net negative relationship but rather can partially or complete suppress an underlying general trend.

Conversely, if you understand how this process works you can exploit it both analytically and practically. Although he doesn't express it in the language of counterfactual causality using directed acyclic graphs (and I'm not really sure why not), several of Tyler Cowen's "Six Rules for Dining Out" in this magazine (and the related book) follow this logic. Start from the assumption that many restaurants go out of business, meaning that failed ones are censored from the remaining pool of available restaurants. Now assume that the two main things that let restaurants succeed are food quality and various other things that we can collectively call atmosphere. The logic of conditioning on a collider implies that among surviving restaurants there should be a negative correlation between atmosphere and food. This implies that if you are monomaniacally focused on good food you should follow the heuristic of avoiding fashionistas and seeking out unpopular ethnic groups as the only way such places could possibly stay in business is if they offer good food. Conversely if you don't have an especially refined palate and really like to be around pretty girls you should probably follow the heuristic of "if you're going to dinner with Tyler Cowen don't let him choose the restaurant."

Most Popular

The legend of the Confederate leader’s heroism and decency is based in the fiction of a person who never existed.

The strangest part about the continued personality cult of Robert E. Lee is how few of the qualities his admirers profess to see in him he actually possessed.

Memorial Day has the tendency to conjure up old arguments about the Civil War. That’s understandable; it was created to mourn the dead of a war in which the Union was nearly destroyed, when half the country rose up in rebellion in defense of slavery. This year, the removal of Lee’s statue in New Orleans has inspired a new round of commentary about Lee, not to mention protests on his behalf by white supremacists.

The myth of Lee goes something like this: He was a brilliant strategist and devoted Christian man who abhorred slavery and labored tirelessly after the war to bring the country back together.

On August 21, the “moon” will pass between the Earth and the sun, obscuring the light of the latter. The government agency NASA says this will result in “one of nature’s most awe-inspiring sights.” The astronomers there claim to have calculated down to the minute exactly when and where this will happen, and for how long. They have reportedly known about this eclipse for years, just by virtue of some sort of complex math.

This seems extremely unlikely. I can’t even find these eclipse calculations on their website to check them for myself.

Meanwhile the scientists tell us we can’t look at it without special glasses because “looking directly at the sun is unsafe.”

Just seven months into his presidency, Trump appears to have achieved a status usually reserved for the final months of a term.

In many ways, the Trump presidency never got off the ground: The president’s legislative agenda is going nowhere, his relations with foreign leaders are frayed, and his approval rating with the American people never enjoyed the honeymoon period most newly elected presidents do. Pundits who are sympathetic toward, or even neutral on, the president keep hoping that the next personnel move—the appointment of White House Chief of Staff John Kelly, say, or the long-rumored-but-never-delivered departure of Steve Bannon—will finally get the White House in gear.

But what if they, and many other people, are thinking about it wrong? Maybe the reality is not that the Trump presidency has never gotten started. It’s that he’s already reached his lame-duck period. For most presidents, that comes in the last few months of a term. For Trump, it appears to have arrived early, just a few months into his term. The president did always brag that he was a fast learner.

An analysis of Stormfront forums shows a sometimes sophisticated understanding of the limits of ancestry tests.

The white-nationalist forum Stormfront hosts discussions on a wide range of topics, from politics to guns to The Lord of the Rings. And of particular and enduring interest: genetic ancestry tests. For white nationalists, DNA tests are a way to prove their racial purity. Of course, their results don’t always come back that way. And how white nationalists try to explain away non-European ancestry is rather illuminating of their beliefs.

Two years ago—before Donald Trump was elected president, before white nationalism had become central to the political conversation—Aaron Panofsky and Joan Donovan, sociologists then at the University of California, Los Angeles, set out to study Stormfront forum posts about genetic ancestry tests. They presented their study at the American Sociological Association meeting this Monday. (A preprint of the paper is now online.) After the events in Charlottesville this week, their research struck a particular chord with the audience.

More comfortable online than out partying, post-Millennials are safer, physically, than adolescents have ever been. But they’re on the brink of a mental-health crisis.

One day last summer, around noon, I called Athena, a 13-year-old who lives in Houston, Texas. She answered her phone—she’s had an iPhone since she was 11—sounding as if she’d just woken up. We chatted about her favorite songs and TV shows, and I asked her what she likes to do with her friends. “We go to the mall,” she said. “Do your parents drop you off?,” I asked, recalling my own middle-school days, in the 1980s, when I’d enjoy a few parent-free hours shopping with my friends. “No—I go with my family,” she replied. “We’ll go with my mom and brothers and walk a little behind them. I just have to tell my mom where we’re going. I have to check in every hour or every 30 minutes.”

Those mall trips are infrequent—about once a month. More often, Athena and her friends spend time together on their phones, unchaperoned. Unlike the teens of my generation, who might have spent an evening tying up the family landline with gossip, they talk on Snapchat, the smartphone app that allows users to send pictures and videos that quickly disappear. They make sure to keep up their Snapstreaks, which show how many days in a row they have Snapchatted with each other. Sometimes they save screenshots of particularly ridiculous pictures of friends. “It’s good blackmail,” Athena said. (Because she’s a minor, I’m not using her real name.) She told me she’d spent most of the summer hanging out alone in her room with her phone. That’s just the way her generation is, she said. “We didn’t have a choice to know any life without iPads or iPhones. I think we like our phones more than we like actual people.”

Antifa’s activists say they’re battling burgeoning authoritarianism on the American right. Are they fueling it instead?

Since 1907, Portland, Oregon, has hosted an annual Rose Festival. Since 2007, the festival had included a parade down 82nd Avenue. Since 2013, the Republican Party of Multnomah County, which includes Portland, had taken part. This April, all of that changed.

In the days leading up to the planned parade, a group called the Direct Action Alliance declared, “Fascists plan to march through the streets,” and warned, “Nazis will not march through Portland unopposed.” The alliance said it didn’t object to the Multnomah GOP itself, but to “fascists” who planned to infiltrate its ranks. Yet it also denounced marchers with “Trump flags” and “red maga hats” who could “normalize support for an orange man who bragged about sexually harassing women and who is waging a war of hate, racism and prejudice.” A second group, Oregon Students Empowered, created a Facebook page called “Shut down fascism! No nazis in Portland!”

Anti-Semitic logic fueled the violence over the weekend, no matter what the president says.

The “Unite the Right” rally in Charlottesville was ostensibly about protecting a statue of Robert E. Lee. It was about asserting the legitimacy of “white culture” and white supremacy, and defending the legacy of the Confederacy.

So why did the demonstrators chant anti-Semitic lines like “Jews will not replace us”?

The demonstration was suffused with anti-black racism, but also with anti-Semitism. Marchers displayed swastikas on banners and shouted slogans like “blood and soil,” a phrase drawn from Nazi ideology. “This city is run by Jewish communists and criminal niggers,” one demonstrator told Vice News’ Elspeth Reeve during their march. As Jews prayed at a local synagogue, Congregation Beth Israel, men dressed in fatigues carrying semi-automatic rifles stood across the street, according to the temple’s president. Nazi websites posted a call to burn their building. As a precautionary measure, congregants had removed their Torah scrolls and exited through the back of the building when they were done praying.

If the president is concerned about violence on the left, he can start by fighting the white supremacist movements whose growth has fueled its rise.

In his Tuesday press conference, Donald Trump talked at length about what he called “the alt left.” White supremacists, he claimed, weren’t the only people in Charlottesville last weekend that deserved condemnation. “You had a group on the other side that was also very violent,” he declared. “Nobody wants to say that.”

I can say with great confidence that Trump’s final sentence is untrue. I can do so because the September issue of TheAtlantic contains an essay of mine entitled “The Rise of the Violent Left,” which discusses the very phenomenon that Trump claims “nobody wants” to discuss. Trump is right that, in Charlottesville and beyond, the violence of some leftist activists constitutes a real problem. Where he’s wrong is in suggesting that it’s a problem in any way comparable to white supremacism.

The nation’s current post-truth moment is the ultimate expression of mind-sets that have made America exceptional throughout its history.

When did America become untethered from reality?

I first noticed our national lurch toward fantasy in 2004, after President George W. Bush’s political mastermind, Karl Rove, came up with the remarkable phrase reality-based community. People in “the reality-based community,” he told a reporter, “believe that solutions emerge from your judicious study of discernible reality … That’s not the way the world really works anymore.” A year later, The Colbert Report went on the air. In the first few minutes of the first episode, Stephen Colbert, playing his right-wing-populist commentator character, performed a feature called “The Word.” His first selection: truthiness. “Now, I’m sure some of the ‘word police,’ the ‘wordinistas’ over at Webster’s, are gonna say, ‘Hey, that’s not a word!’ Well, anybody who knows me knows that I’m no fan of dictionaries or reference books.