A Vow I Made

This has to do with a paper that I published in 2015 but that I
started working on in late 2012 and substantively wrote in 2013: “The
Control of Managerial Discretion: Evidence from Unionization’s Impact
on Employment Segregation.” In many ways, that
paper riffs off of work that had been published a few years earlier,
work that investigated the effect of firms’ diversity policies on
their actual workforce composition. That work is both very good and
limited: very good because it looked at longitudinal, within-firm
changes rather than just changes across a population of firms; limited
because the authors could not model and adjust for the self-selection
of firms into adopting such policies. (All good research is limited in
some way, but that’s a topic for another post.) My idea was to look at
union-representation elections, because unions impose many of the same
constraints on arbitrary management that diversity policies do, and
yet unionization isn’t self-selected by the employer. It is
self-selected by the employees, but because they do so through
election records, I could focus on very close elections and use a
regression-discontinuity design to identify the treatment effect net
of self-selection.

Let me pause and emphasize that I was expecting to find effects like
that earlier work had found. I figured I’d show similar trends, but
support them with cleaner causal identification. That’d be a real
contribution!

I was by myself on Christmas Eve 2012, and I spent most of the day
confirming I’d built the dataset correctly and then running the
analysis. Indeed, when I looked at the full dataset, I found results
that looked like those earlier studies!

…And the moment I started zooming in on the closer elections, all
those results went away. When you adjusted for self-selection, there
appeared to be no effects at all.

I spent much of that Christmas Day in something like panic. I had this
design for a paper, I had this thing I was going to show, and the
results were actually the opposite. I felt like I’d just wasted
several weeks–or months, if you count the time of getting the data in
the first place. Then my father and step-mother came into town, and I
put the paper aside for a week while they visited.

Shortly after the new year, back at Stanford, I opened up the paper
and looked at what I’d written. I’d written most of the front end of
the paper already. That front end wasn’t about why it was
theoretically important that the control of managerial discretion
improved workforce diversity. Rather, it discussed this theory, but
then explained how the evidence for it had this weakness around
self-selection. Then I explained my own research design, and how it
would help with that problem.

At some point, reading this, the lightbulb went off. I’d gotten the
opposite results from what I’d expected, but I didn’t really have to
change the front of the paper at all!

That this was a revelation to me, 3.5 years into my assistant
professorship, says several things about how we were implicitly taught
to do research.

First, we were taught that good papers made a theoretical
contribution. But a theoretical contribution was almost never couched
as a contribution to a theory, such as better evidence. Rather, it
was couched as a new theory. It might build theoretically on
existing work, but if it only built empirically on what was out
there, it wasn’t interesting.

Second, we were taught that replication studies were boring and
uncreative. These were the mark of a workmanlike but probably
uninspired student who couldn’t come up with their own ideas. (This
probably isn’t obvious today, as the social sciences are roiled by the
replication crisis, but when I was starting graduate school sixteen
years ago, it was assumed that replication studies would replicate.)

Third, we were taught that “You can’t learn anything from a null
result,” full stop.

Today I disagree with all of these points, which I’ll detail in a
moment; but what’s really striking here is that I’d learned about
causal identification and research design since my first classes in
graduate school. In my first research design class, we’d devoted a ton
of time to the Fundamental Problem of Causal Inference. In labor
economics, Carolyn Hoxby had drilled us on the Program Evaluation
Problem and the Heckman/Lalonde debate. Chris Winship taught a whole
class on “The New Causal Analysis,” preparatory to overhauling the
core sociology methods class. (Yes, I got my PhD at MIT, but I did a
lot of coursework at Harvard Economics and Sociology.) We read Rubin
on the potential-outcomes model, Pearl on directed acyclic graphs,
Angrist on instrumental variables, Van der Klaauw on regression
discontinuity…hell, it was a paper with a regression-discontinuity
design that had prompted this freak-out!

This all points to something I almost never see talked about. Within
organizational research, and in business schools more generally, we
absorbed many of the arguments for and techniques of causal
identification without necessarily updating our assumptions about how
knowledge generation works. We had been educated in a framework that
presumed routine theory generation and predictable empirical support
for those theories. Causal identification was imported as a way
strengthen that support, and maybe to raise the minimum bar for what
would be considered support. But little else changed. And in most
places, I think it still hasn’t.

Consider those three points:

First, today I think that there are many types of contributions that
research can make. New evidence in support of or against an existing
theory should be considered a theoretical contribution. After all,
our faith in theories is not binary. It is, basically, Bayesian. Or
at least it should be.

Second, replication studies are neither boring nor
uncreative. Indeed, one of the reasons they are not boring is
because often earlier studies cannot be replicated. But the term
“replication study” is itself over-applied. Trying to test an
existing theory with a new method, with better data, with cleaner
identification–none of these things is rote replication. Such
studies often involve considerable creativity of their own.

Third, it is correct that you cannot learn anything from a null
result in an unidentified study with observational data. But of
course you can learn from a null result in a well-designed
experiment. Even a quasi- or natural experiment’s null results can
tell you something. The experimentum crucis, since its coining
by Hooke and Newton, has been a core piece of the scientific
method. The logic as it applies here is simple: if we have a theory
that makes predictions, and if we agree in advance that a study
design is adequate for testing those predictions, then a null result
in that study should reduce our prior confidence in that theory. (I
think it’s canonical to reference the Michelson-Morley
Experiments
here.)

Hence me, about a week after killing the results in that paper, coming
to the realization that I had learned something. I’d reproduced the
earlier results when I didn’t control for self-selection, then killed
them off when I did control for it. This shouldn’t make me despair
over the study; it should reduce my confidence in the theory.

This was the most liberating moment of my early career. I’d been
socialized to have one of two responses at this point. Having
assembled my data and found null results, I was either supposed to
abandon the project (maybe put it in a file
drawer) or continue
my “exploratory analysis” until I found out why there wasn’t an
effect–in the process finding an effect elsewhere–and “reframe” the
paper around that. (At this point someone may say that that wasn’t
what I was “supposed” to learn from my training. Maybe. But I’m a
reasonably intelligent man and a good student, and this all seemed
pretty unequivocally communicated to me. More, I’ve talked to enough
of my colleagues to know that I wasn’t the only one who imbibed these
beliefs.) But, I now realized, I didn’t have to do either of those
things. I’d found a null result, one that contradicted earlier
research, but I thought it was right. That null finding was a
contribution in its own right. Yes, the paper would be harder to
publish, probably, but that didn’t matter. I’d found something I
thought was real and should stand by it.

Which brings us to the vow. That day, I vowed never again to start a
project unless I thought that its question was interesting however the
answer shook out.

Perhaps this sounds banal. This after all is how science is
“supposed” to work. But my experience is that it still hasn’t really
sunk in in my field. When I present null-results papers, for example,
I still get suggestions of different ways to slice the data such that
I’d be more likely to find an effect, which (it usually follows) will
make the paper easier to publish. But we’re not in this business just
to publish papers, or for that matter to find effects. We’re in this
business to answer questions.