Search Site

Tuesday, May 26, 2009

Debating the Guidelines

In my posts so far, I have simply assumed that there exist reliable quality guidelines to measure the quality of empirical work, and then I have looked at how these guidelines should be used. To understate things, this is a substantial handwave. These guideline can be quite difficult to develop, and in this post I want to touch on some of the important issues raised by these challenges, particularly for litigation.

The litigation problem is easy to define:

1. In many cases, the scientific dispute can be dispositive.2. Different guidelines can reachdifferentconclusions, even when applied to the same set of studies.3. The result of a case can thus turn on which set of guidelines is used.4. It is hard if not impossible to assess which set of guidelines is superior.

At first blush, then, shifting from jury fact-finding to systematic reviews simply replaces one kind of arbitrariness--that arising from epistemic incompetence--with another, namely the choice of guidelines. There are, however, two responses to this, one short-run and the other long-run.

The short-run response is that this may be a short-run problem. Quality guidelines are still in their infancy, and some of the disagreement across guidelines will, I hope, disappear as the field matures. After all, many guidelines fail to agree on what constitutes quality in the first place; and quality terms are often asserted, not empiricallyvalidated themselves. But with time, definitions of quality should converge, and already there is work on testing the relevance of particular quality items. In fact, the short-run conflict in guideline outcomes could have long-run benefits, since it highlights the issues that need attention.

But long-run problems will persist, both with regards to quality and empirical validation/guideline disagreement.

1. Quality. At the end of the day, quality is a normative issue. All might agree that minimizing bias is important, but there is no objective way to state how bias and efficiency should be balanced, and there may be normative disagreement about the very relevance of other quality components (such as the ethics of the study design). There are two ways to approach this problem.

First, we can think of quality's definition as a matter of law for the judge to settle. What counts as quality can be left to the judge. And defining what quality is is likely a far easier job, and one judges are far more capable of undertaking, than deciding whether a particular study meets that definition. Moreover, this is the kind of task we frequently ask judges to do. In many ways a ruling about whether evidence should be excluded from a criminal trial is a question about how our definition of quality should take into account a constellation of normative values.

Second, and more challenging, empiricists in all fields--not just epidemiology--may be forced to confront the deeper epistemic challenge laid down by Diana Petitti:

[I]f epidemiologists cannot define what
constitutes quality in non-experimental studies, how is it possible to do
studies that we all agree have merit? If meta-analysis fails because quality is
elusive, then all of non-experimental epidemiology fails for the same reason.

This is not an issue for the courts to resolve, but the sciences. As they wrestle with this significant problem, the legal system may likely be able to do no better than judicial definitions. But explicit judicial definitions of quality would be a huge step forward, injecting a powerful dose of transparency into the process, and perhaps encouraging a debate within the law about what good evidence should look like (without requiring legal actors untrained in the sciences to debate whether particular projects satisfy those standards).

2. Dueling guidelines. Over time, guideline standards may converge. But at least two problems should remain, one of which cannot be eliminated. Some disagreement in how to measure particular quality elements may persist, and different analysts always run the risk of applying identical guidelines differently.

First, what happens if two sets guidelines agree on what is quality, but (1) disagree about how to measure it, (2) each use empirically validated measures (and neither can be shown empirically to be better than the other), and (3) reach different conclusions? This could be seen as a flaw in guidelines. I see it as a core strength. Dueling validated guidelines are epistemically informative: they tell us the answer, and that answer is "we do not know." This is an answer we must become more comfortable with, an issue I will turn to in an upcoming post.

But guidelines are often developed for a particular problem, so at least in the litigation setting we won't have dueling guidelines, just the awareness that a different set of guidelines has the potential to reach a different answer. Does this eliminate the usefulness of guidelines for litigation? No. First, the development of off-the-rack quality terms and validated measures will minimize this problem. Second, it is important to avoid the utopian fallacy. Sure, different guidelines could reach different answers. But so too could different juries listening to the same experts. The real question is: will guidelines lead to the right answer more consistently? Or, perhaps: will guidelines lead to the right answer with sufficiently greater frequency to justify the impositions they put on, say, party control?

This utopian fallacy argument can be used to address the second problem as well, namely the important role of judgment in guidelines. Guidelines ultimately will include some sort of subjective element ("does the study properly control for endogeneity?" or "is the sample size sufficiently large?"), and different subjective answers can lead to different outcomes. But again, jurors and partisan experts are applying their subjective judgment, so the question is whether independent experts limited by rigorous guidelines will do a better and more consistent job at doing so. By now, it should be clear what my prior beliefs are about the answer to that question.

Evidence based policy is still in its infancy, and how it should fit into the legal system is a question that scholars and legal actors are only just beginning to address. It may be that EBP will have to mature more before it is ready to be incorporated into the legal system, and so we should not stop considering less extreme alternatives (such as hot-tubbing) in the interim. But at the same time, we should not look at the current limitations of EBP and despair of it ever being useful to the law. It is a growing field, one that is maturing every day. And after all, at one point in his life, Usain Bolt couldn't even walk.