Russ Roberts interviews Josh Angrist: interesting throughout, definitely read or listen to the podcast: Josh’s thoughts on pre-commitment/pre-analysis, especially when it comes to non-experimental work “I have mixed feelings about it, because I don't do a lot of randomized trials and I think the idea of precommitment becomes very difficult in some of the research designs that I use where you really need to see the data before you can decide how to analyze them. You're not sure what's going to work. That said, when you can precommit, that's a wonderful thing, and it produces especially convincing findings. The idea that I should show the world a mapping of all possible models and that that's the key to all good empirical work: I did disagree with that at the time and I still do….So when I talk about RD and I'm using RD, regression discontinuity, methods to estimate the effects of going to an exam school, you know the design there is that I'm comparing people above and below the test score cutoff. And if that design is convincing, it'll satisfy certain criteria, which I then owe my reader. But I certainly don't owe my readers an account of all possible strategies. I really do build it from my proposed design.”

In the latest issue of the Journal of Development Effectiveness, a symposium on how evaluation is used by a variety of organizations including BRAC, the IDB, Oxfam, the Government of South Africa and the Millennium Challenge Corporation. The piece on the MCC has useful discussions of some of the practical issues with randomized phase-in: “the use of randomised roll-out approaches in four of the five evaluations seemed feasible at baseline, when the programme implementation schedule allowed for the necessary gap between the start of the treatment and the start of the control trainings to guarantee a sufficient exposure period for impacts on outcomes. However, in several cases and for a variety of reasons, that exposure period was reduced”….” there is also pressure to incorporate or compensate control groups. However, stakeholders should recognise that sufficient exposure to treatment, based on a well-conceived programme logic, is required for the evaluation to measure changes in outcomes. For example, the Nicaragua results suggest that the major benefits of that programme did not start accruing until the second and third years of the programme.”And Russ on experiments “economists are typically unconvinced by so-called 'scientific experiments' using first-rate research design. It's very easy for them to say, 'Oh, the RAND study--it didn't look at a long enough distance, the Oregon study didn't have enough power. They didn't have a big enough sample. There were problems of selectivity.' (of course Josh pushes back)

Josh also discusses development experiments a bit, including the microfinance work “One of the things that's important to remember…is: one of the big roles of a social scientist is to point out what's not likely to work”

Another somewhat skeptical take on the value of pre-analysis plans by two lab experimental economists (Coffman and Niederle): “It may also increase the temptation to use only very minor deviations from existing designs which will ease the commitment to a pre-analysis plan as it will reduce the chance to receive surprising results that may call for slightly different analyses. The costs for exploratory work will be increased relative to somewhat more derivative work. Finally, the costs may be particularly high for young researchers who may be less experienced and have a harder time to foresee the potential pitfalls of certain designs…..

As for when they should be used “for projects that are likely to be the only test of a hypothesis (like a large field experiment, a randomized control trial in a developing country, or an expensive neuroimaging study), the gains can be enormous. We should encourage PAPs in areas where it is likely the case that every PAP will result in a published paper, and where replications may be difficult. Additionally, a researcher who foresees the need for analytical tools that bring about suspicion, like analyzing subgroups or removing outliers, she may want to make a credible commitment beforehand so she can do so without being met with wariness.”

The biggest problem with headlines interpreting the the PLoS ONE study, of course, is that paper *replication* is equivalent to paper *production*. It's hard to argue that, when replicating a given output, it is much easier to replicate using a WYSIWYG system, for which LaTeX is not (unless one uses a front-end such as LyX). But production is a different exercise altogether: it involves thinking of what one wishes to convey. This naturally reduces error production and, I would argue, would not give rise to the sorts of "errors" identified in the study. One may even find the clutter-free form more conducive for focusing on writing!

Just as important, the interaction of this production process with standard LaTeX applications would often result in automated output that is far more error-free than the replication exercise would entail: tables would be automatically output into TeX-friendly form, with just minor editing for style; text editors would pick out typos in a way that is more easily observed by a writer than by a copyist; and references are automatically cross-checked using BibTeX so any non-matching references are immediately caught.

One aspect, less highlighted in the study, is that TeX capabilities for collaborative writing is definitely more primitive. Indeed, as a user of both, my decision in favor of TeX for research articles is premised on the greater efficiency with which one can crank out scientific output, whereas when I am crafting a CV or a newsletter, I revert to Word, given the focus on formatting.