My initial commentary of Royce de Rohan Barondes’s paper on error rates linked to judicial clerks criticized Barondes on a number of points, including an alleged failure to employ a fixed effect model in his estimation. Barondes has pointed out to me that although he never mentioned the words “fixed effects” in his initial paper – an omission that he has now corrected in the latest version of his paper released this month – the software command that he employed did in fact control for judge fixed effects. While I hope to have time to review the latest version of Barondes’ research in the future, I want to set the record straight vis-à-vis the original version, and accordingly I am reposting my initial critique (below), edited with overstrikes to identify the parts of that critique that are inaccurate in light of Barondes’s use of a fixed effects methodology.]

In my view, Justice Scalia blundered badly last week in his concurring opinion in Baze v. Rees when he ineptly attempted to suggest that the empirical evidence supported the view that the death penalty in the U.S. has deterred murder. My coauthor Justin Wolfers and I had shown that the evidence upon which Scalia rested his beliefs did not in fact support his conclusion. The empirical debate on the death penalty perfectly illustrates the conclusion that it is very easy to draw erroneous conclusions from statistical data. Donohue and Wolfers, “Uses and Abuses of Empirical Evidence in the Death Penalty Debate,” 58 Stanford Law Review 791 (2005).

Another illustration of empiricism gone astray is provided by a new working paper by Royce de Rohan Barondes, which adopts the following provocative title: "Want Your Opinions Questioned or Reversed? Hire a Yale Clerk." The man bites dog nature of the claim is sure to raise interest in the paper, since Yale is obviously one of the most elite law schools in the U.S., and the hardest to get into. Unfortunately, counterintuitive empirical results almost always turn out to be wrong if they are not based on an appropriate empirical methodology for the inquiry at hand. In my opinion, the methodology of the Barondes is flawed, and the conclusions drawn from this research are either incorrect or unfounded. My review of the Barondes paper (as well as my own personal experience with Yale Law students) affords little reason to believe that the value of a Yale Law clerk is less than the law school’s preeminent ranking would suggest.

Before turning to the problems in the paper, let me mention some good points. Barondes has collected data on judicial decisions by federal district court judges, along with information about the law clerks for these judges, and creatively thought to see if Shepherd’s Signals could give some inexpensive insight into the quality of the judicial decisions. This is all to the good. However, there are no shortcuts to knowledge, and I would encourage Barondes to spend more time examining the nature of the cases that are getting questioned by the Shepherd’s indicator (about which more will be said below). The primary problem with this study involves the nature of the econometric specification. Papers that fail adequately to address the difficult specification issues of endogeneity and omitted variable bias simply do not provide reliable estimates and therefore cannot be relied upon.

Barondes claims that he has found evidence that federal district court judicial opinions are reversed or questioned more when the authoring judge has a greater number of Yale Law clerks. The paper presents some interesting data concerning the roughly 13,000 judicial opinions written by 95 district court judges over a 56 month period , but it makes two major mis-steps. First, it ignores all of the recent lessons of why panel data analysis is a superior method for estimating causal influence than a simple cross section analysis. Second, even if the cross-sectional approach of the Barondes paper were not flawed, Barondes has overstated his results and has failed to control properly for various important factors, such as political affiliation of the judge [political affiliation is a fixed effect impounded in the conditional logit] that could well undermine his claims.

Let’s begin with the big problem of causal influence. Barondes uses “Shepherd’s Signals” as a quick device for identifying something problematic in the 13,000 opinions in his data set. Barondes concludes that judges with more Yale Law clerks tended to have more of these “problematic” signals, then we might expect given the high ranking of Yale Law School. Unfortunately, Barondes has failed to understand the likely causal relationships between behavior of judges in writing decisions that are more likely to be questioned and the process of selection of clerks. The judges almost surely drive the error rate and the clerks show up as “significant” in the regression because there is small, but possibly significant relationship between the type of judge that gets reversed and those who will select (and be attractive to) Yale law students.

Simple cross-section studies of the type that Barondes provides are not well-suited to teasing out causal relationships given the underlying links when some judges are both more likely to have these ostensibly negative signals, and somewhat more likely to hire Yale Law clerks. Barondes’ error is a bit like concluding that because the death penalty is almost nonexistent in the Northeast, which has the lowest murder rate in the country, and widespread in the South, which has the highest murder rate, this means that the death penalty causes murder.

While it is true that a simple cross-section correlation of execution rates and murders by state will naively suggest that more executions lead to more murder, this regression will almost certainly generate the wrong causal answer in suggesting that the death penalty massively increases the murder rate. A better approach would control for the fact that some states (e.g., Southern states) have persistently higher murders rates and then look to see what happens to murders in those states when executions rise or fall.

Similarly, to tease out the effect of a Yale Law clerk, Barondes needs to hold the judge constant in the same way that we just said we have to hold the state constant to tease out the effect of an increase or decrease in executions. Interestingly, Barondes did have this data available, but he failed to use it. Specifically, he collected information on district court decisions written in two different 28 month time periods. A first cut at the question he wants to answer – what is the impact of having an additional Yale Law clerk on the likelihood of having an opinion questioned or reversed? – would simply compare for each judge the rate of negative signals in the second period minus the rate in the first period with the percentage of Yale Law clerks in the second period minus this percentage in the first term. A positive relationship might be taken as suggesting that more Yale Law students led to an increase in negative signals and a negative relationship would suggest the opposite – negative signals fall as Yale Law clerks area added.

While this test would be better than what Barondes did, it is still not the best approach. To see why, note that Barondes shows that the 95 judges had more Yale law clerks in the second period than in the first (5 percent versus 2 percent). My prior suggested approach would control for the constant or fixed effect of each judge on the likelihood of negative ratings, but it would not control for any varying effect. Judges were getting older over the course of the two time periods, and they were also gaining experience. Might these factors influence the rate of negative signals? They well might, suggesting that a control for judge’s age should likely be included, which Barondes fails to do.

A 35 year old judge at the start of the first period may be getting better (and therefore less likely in the second time period to make mistakes that could lead to a negative signal) while a much older judge may be slowing down and thus experience an increase in errors in the second time period. There also may be constant effects each year (or each period) that make it more likely that cases written at that time will be reversed, and a control for those time fixed effects would be helpful. Only when the fixed effect of the judge, as well as the aging effect and time fixed effects have been controlled for, would we expect to be able to identify the impact of having a Yale law clerk on the rate of negative signals. Yet, Barondes controls for none of these factors.

With the large jump in Yale clerks in the second period, if aging or time effects (such as the switch from the Clinton to the Bush Administration which occurs during his second time period) could well be driving up reversals, which happen to correlate with the greater number of Yale Law clerks in the second time period. Thus, Barondes’ Yale Law clerk dummy is picking up three effects – aging of the judge, change in political administration and other time period effects, and any influence of the Yale law clerk – when he only wants to capture the last. Since Barondes also fails to correct for the dominant influence of the fixed attributes of the judge (judicial philosophy, political affiliation or ideology), the noise in Barondes’ coefficient estimate is great relative to the signal that he hopes to capture.

For those who thirst for econometric terminology, Barondes should have run a panel data model with judge and year fixed effects to see if the presence of Yale clerks influences the negative signal rate. (For example, if the judges with no Yale clerks had a 10 percent decline in their negative signal rate across the two periods (perhaps because they gained experience and were better judges), then if we saw a similar 10 percent decline in the negative signal rate of the judges who increased (or decreased) their number of Yale clerks, we would concluded that the clerks were not influencing the judges. On the other hand, if the judges who increased their hiring of Yale clerks over the period experienced a larger decline in the negative signal rate – say 15 percent – then we would conclude that having Yale clerks reduces the negative signal rate.) I would be extremely surprised if his result held up in such a model. Instead, his model simply correlates the negative signal rate with the presence of Yale clerks, which leaves us with the same problem of correlating high executions with high murder rates. We can have no confidence that a causal relationship was identified.

The probing reader might respond to this discussion by contending that it suggests that the problem is not that Yale clerks degrade the quality of judicial opinions but that judges who select Yale clerks are themselves defective. At this point another deficient aspect of the Barondes paper must be acknowledged: the paper is insufficiently nuanced in its failure to note that the term "negative" is imprecise. Indeed, the Shepherd’s Signals can identify two very different phenomena. This identifier might signal errors in understanding legal doctrine or following precedent -- the Barondes’ meaning -- as well as outcomes that would not necessarily be seen as pejorative. For example, a liberal judge like Judge Jon Newman in the early 1970s might have gotten lots of "negative" ratings as he was trying to push the law in a direction about which the Burger court was skeptical. Yet Newman is universally acknowledged to be an outstanding judge (who hired many Yale clerks). Ironically, then, some of the best judges might have high negative ratings. If these are the judges who are selecting Yale Law students (as Newman for one tended to do), then the true fact would be "pioneering judges tend to select Yale Law students," instead of "if you want to get reversed, choose a Yale Law clerk." The story is really that certain judges choose Yale Law students, not that ill-informed law clerks are leading judges into error. If Yale clerks tend to be liberal and prefer to work for liberal judges at a time when the Supreme Court is or becomes more conservative, it is not surprising that a somewhat higher proportion of opinions by judges have Yale clerks would be questioned by other (more conservative) judges.

Another way to highlight the dangers of the simple cross-section regression that Barondes runs is to consider a hypothetical (poorly designed) study of police effectiveness. The empiricist looks at all 911 calls and measures bad outcomes that occur during each police response. The bad researcher notices that in the calls for officer assistance in which the police commander sent the SWAT team in to deal with the problem, more people died than when the commander sent one of his school crossing guards. From this, the bad researcher then publishes a study with the headline: Want to deal with that hostage crisis? Forget the SWAT team and send Harry the crossing guard. Of course, it is the deadly situation that leads to the SWAT team being sent and to the high risk of fatalities; the SWAT team doesn't create the trouble, but is trying to deal with it. Conversely, no one is ever killed when Harry shows up because he is usually asked to go over when a cat is stuck up in a tree, not when terrorists have taken 35 people hostage and are threatening to blow them up.

Apart from its muddled causal story, the Barondes paper both oversells its own naïve findings and suffers from some other specification defects, in any event. The paper looks at the roughly 13,000 judicial decisions written by 95 federal district court judges (across two time spans of 2 years and four months) who provided school information for at least one law clerk in surveys of clerks done in 1997 and 2001. Roughly 8 percent of these opinions are given a Shepherd's "warning" or "questioned" indication. The author concludes that the likelihood of receiving this negative assessment will rise to 9.5 percent (based on my calculations from Barondes’ fn 48), even though a 95 percent confidence interval around this point estimate would include the possibility of a REDUCTION (rather than an increase) in the negative outcomes (because his point estimate is not statistically significant at the .05 level). In other words, the author's own finding is statistically weak.

Moreover, the advice to shun Yale Law students is unpersuasive even if the causal story were not muddled and the statistical evidence were strong rather than weak. The above estimate of a bump up in the "negative" rating assumes that you are selecting one extra Yale Law clerk and holding everything else constant, where some of the other factors in the model really can't be held constant. Specifically, if you are selecting a Yale Law clerk, it means you are not selecting some other clerk. Barondes found that the better ranked the law school from which the clerk comes, the LOWER the negative ratings. Obviously when a judge chooses a Yale Law clerk he or she is getting a clerk from a highly ranked law school. In fact, the "high ranked law school" effect leads to lower rates of negative indicators. Superficially, the paper claims that if you could get a student from a school other than Yale THAT IS AS HIGHLY RANKED AS YALE LAW SCHOOL, then you would get the benefit of the better school effect without the negative effect the author attributes to Yale. One could add controls for other top schools – perhaps Harvard and Stanford – and then test whether their estimated effects are different from the Yale control. Again, I doubt there will be a difference in the direction the author states. The provocative title of Barondes’ paper suggests that if you want the train to go off the tracks, hire a Yale law clerk, but this is just silly. The estimated effect is small and not robust, even if one accepts the author's interpretation (without the qualification of the better-school effect, which swamps the estimated Yale effect).

Barondes attempts in his Table 5 to deal with the issue raised in the previous paragraph (to be technical -- non-linearity in the quality of law school effect biasing the estimate of the top-ranked law school), but unlike Tables 4 and 6, which show 6 models, the author only shows 1 model in Table 5 (was this because the other models went against his thesis?). The author wants to compare Chicago favorably with Yale, but eyeballing it, I suspect that in almost all the models there is no statistical difference in the estimated effects for these two schools. Moreover, one un-named judge wrote about 1300 decisions (!) in the 4 year 8 month study period (about 10 percent of all the opinions written by the 95 judges), and when that judge is dropped, the Yale effect goes away entirely (Table 6, column 3) or loses significance (Table 4, column 3). (Why not name the judge? That would have been one of the most interesting tidbits of the paper.) In Table 6, column 3 the positive Yale coefficient is actually smaller than the positive Chicago coefficient (although both are statistically insignificant). Again, this suggests to me that judges, not clerks are likely driving the story.

My colleague Roberta Romano notes that Barondes speculates that Yale law clerks may know less legal doctrine because of the school’s famous emphasis on theory. But Romano points out that bar review passage rates would at least give a sense of whether Yale Law students are deficient in acquiring knowledge of legal doctrine. To test this I thought one might look at July 2007 bar passage rates by school for the single largest state. As it turns out, across all non-California law schools with at least 15 applicants, Yale had the highest bar passage rate (94.1 percent). California bar exam takers from the University of Chicago and Harvard did quite well, but their passage rates of 86 and 87 percent were clearly lower than that of Yale students. Yale law graduates are looking better all the time!

There are some features and anomalies that appear in the data that the author does not comment on. First, who are these 95 federal district court judges? Is there something unusual about them? Table 2 suggests that in the early year only 2 percent of their law clerks were from Yale and 1 percent were from Chicago. Four years later the percent had rise to 5 percent Yale and 6 percent Chicago. That is a large jump and should have been examined.

Second, while the paper is not entirely clear, there may be an odd matching of clerks to opinions. Barondes doesn't know with certainty that the Yalies are the ones writing the opinions that get reversed. (To follow in the fanciful vein of the paper, perhaps I should hypothesize that the Yale Law clerks are so dazzling that the other clerks fall apart and mess up more since they know they can't compete with the very best.). Also, it appears that the author looks at judicial opinions from 9/96 - 12/98 and 9/2000 - 12/2002 and links data on the clerks working for what I suspect is 9/96 - 8/98 and 9/2000 - 8/2002. In other words, the judicial decisions four months after the law clerk data ends are attributed to the prior law clerks (presumably on the theory that the previous clerks worked on those cases), but at the start of the clerkship period it is implicitly assumed that all judicial decisions are attributable to the current clerks. Again, no mention is made of this apparent inconsistency.

Third, roughly 15% of the time, cases receive a caution but the author doesn't show those results. Again, one wonders if these results were dropped because Yale effect did not appear there.

Fourth, Barondes controls for an interaction of his second time period and whether the Judge is a Republican appointee and a second interaction of the second time period and Democratic appointee. But when interaction terms are used one must include both of the constituent terms in the model (that is, separate controls for the second time period and for political affiliation). What would be better is to have a second period time dummy, a Republican time dummy, and an interaction of these two terms. See, Brambor, Thomas, William Roberts Clark and Matt Golder, “Understanding Interaction Models: Improving Empirical Analyses,” Political Analysis (2006) 14:63–82.

In sum, I am confident that a more suitable methodology than the one employed by Barondes would reveal that Yale Law clerks are extraordinarily capable and effective public servants. All judges will likely be pleased to hire them.

1. Why does the author refer to "econometric" issues rather than statistical issues? Nothing here is economometric, this is all Stats 101 (or 301, maybe).

2. There may be some differences of opinion as to whether a judge (like the Judge Newman whom the author mentions) who is pushing the law in a way the Supreme Court doesn't want to go is acting appropriately; I am not sure he is. This issue is beyond statistical analysis.

3. It doesn't really matter whether the paper holds up statistically: it's funny and good for annoying one's Yale Law grad colleagues, and surely that is more valuable than mere statistical truth.

Your opening comment about Justice Scalia's alleged misreading of current social science data may well be correct, though as a longtime opponent of the death penalty you are hardly a disinterested observer. But this is hardly unique. When judges, be they liberal or conservative, engage in pop social science, the result is nearly always bad.

One of the implications of this observations is that judges ought not adopt an interpretive methodology that requires them to think about social science at all. Maybe something like "read the Constitution and do what it says." Under such an approach, the question of whether the death penalty deters crime, while undoubtedly interesting, is wholly irrelevant to the legal issue.

You are also almost certainly right that the study in question doesn't demonstrate that judges ought not hire Yale grads as law clerks out of fear of reversal. Indeed, any judge who did so would be tacitly admitting that his or her law clerks had excessive and improper influence. But again, you miss the main point: one ought to refrain from hiring Yale Law School graduates as law clerks because the Yale Law School has been and continues to be a pernicious influence on both the law and legal education. Hiring law clerks from other institutions undermines (to some very small extent) that pernicious influence.

One of your suggested analyses makes me very nervous. You argued that better evidence in favor of a causal relationship could be produced if the change in the negatives (between time periods) was correlated with the change in percent Yale clerks. That is not an appropriate way to do it. Change scores should never be the predicted variable. Instead, the negatives from the first time period should be included as a covariate and the negatives from the second period should be the DV. (Using the change in Yalies as a predictor is fine.)

With that said, and agreeing that anything approximating a time-series analysis will always produce stronger evidence of a causal relationship than a simple cross-section, please not that time series are not immune to spurious relationships. Barring (as it were) an actual experiments, you have to include everything as covariates to establish causation.

I thank you for your detailed observations on my paper, and I am hopeful that I will be able to benefit from your wisdom as I further review your lengthy post, as well as the numerous other comments I have received.

I am somewhat puzzled by part of your comments that I have been able briefly to skim. You discuss why one might include some reference to whatever corresponds to area in your geographic area in the death penalty hypothetical (the judge or something else):

"clogit fits maximum likelihood models with a dichotomous dependent variable coded as 0/1.... Conditional logistic analysis differs from regular logistic regression in that the data are grouped and the likelihood is calculated relative to each group; i.e., a conditional likelihood is used. ...

That's what's reported in the results. Of course, with this kind of model, any variable that does not vary for the identity of the judge is collinear and cannot be included. So that would be why, for example, those models do not include dummy variables for the jurisdiction in which the judge is located or, of course, the judge identity itself.

Other commenters have requested that I simply report results showing dummy variables, one for each judge (other than one). I did that when I was at the office and posted on volokh.com the following:

There was an inquiry about the results from a more customary logit estimation using dummy variables for each judge (other than, of course, one of them). I can confirm that using a logit estimation with all the independent variables that are in model 1 in Table 4, but with 92 dummy variables added, one for each of 93 judges (2 judges' opinions being dropped because their opinions never have these adverse signals), has a parameter estimate [t-statistic] for the Yale Law School variable of: 1.64 [3.09], which is not materially different from the results reported in the paper.

I don't have the full regression results here at my home, though I would be pleased to post them were they of interest. The reason I did not report that kind of result in the paper is that the authorities I referenced indicated that would be improper. Rabe-Hesketh & Skrondal, Multilevel and Longitudinal Modeling Using Stata at 131 (2005), which has the following discussion concerning estimation of the impact of certain treatments of patients (corresponding to the judges in the paper):

"[I]t would be tempting to use fixed intercepts by including a dummy variable for each patient (and omitting the overall intercept). This would be analogous to the fixed-effects estimator of within-patient effects discussed for linear models in section 2.6.2."...

The authors, after describing a problem with doing that, says, "we can ... construct[] a likelihood that is conditional on the number of responses that take the value 1 (a sufficient statistic for the patient-specific intercept). ... In logistic regression, conditional maximum likelihood estimation is more involved and is known as conditional logistic regression. Importantly, conditional effects are estimated in conditional logistic regression ...."

So, I am having some difficulty harmonizing this discussion, which describes the estimation technique used, with your assertion, "The judges almost surely drive the error rate and the clerks show up as “significant” in the regression because there is small, but possibly significant relationship between the type of judge that gets reversed and those who will select (and be attractive to) Yale law students."

Research is part of an ongoing discussion among academics. I am happy to benefit from your wisdom in this regard. If you are interested in the results from estimating different models, I will be pleased to report to you the results of a couple of alternative estimations that you would find of interest using the variables used in the models in the paper (of course, in addition to the judge identity, as that is, in fact, part of the syntax of the Stata estimation of the models that are reported), as long as you fully specify them and do so in the syntax of Stata commands (adding variables not specified in the models currently reported could not be done in short order, depending on whether the underlying data, as it has been saved, includes it).

This post is 18 paragraphs too long. Everything after "there is small, but possibly significant relationship between the type of judge that gets reversed and those who will select (and be attractive to) Yale law students" is superfluous.

I could not tell whether the closing assertion in the "In sum" paragraph was written with tounge-in-check. If not, I have some concers expressed here:"Why empirical research is better at raising questions than answers --- some ruminations about ruminations about the Yale clerk study"

I am neither a lawyer nor an accomplished scholar. But as one who is beginning to stretch his empirical wings, is forced to spend hours a day picking apart empirical research for professors and an individual with a particular disdain for lawyer’s lack of statistical/economic knowledge I must objected to Professor Donohue’s numerous straw men and red herrings. While I will not take pointed issue with every point he has made, I feel I must set some records straight in case some poor law student reads this entry and garners his only impressions of empirical research from this blog. I will also throw aside the usual common pleasant tone I normally adopt for academic discussions and instead adopt the brutish, defensive demeanor of Professor Donohue.

First to the general objections Donohue has presented. I am appalled that any person who claims to be an empiricist (and he has many credits to his name) could ever say, “I am confident that a more suitable methodology than the one employed by Barondes would reveal that Yale Law clerks are extraordinarily capable and effective public servants.” That is the epitome of naivety. So he knows that including other controls and using panel regressions would clear his precious graduates? How can he? How can anyone know what the results of modified models and techniques would be?

In the spirit of his volley at Barondes, I would levy an argument that there is a fundamental flaw in Donohue’s own research (Donohue and Levitt, The Impact of Legalized Abortion on Crime, Quarterly Journal of Economics, 2001). Their research claims that legalized abortion altered the demographic makeup of the youth of the 70’s and 80’s which caused a reduction in crime in the 90’s. The problem is that they only showed the negative relationship between abortion and crime. No evidence was presented that the underlying demographics of our society changed, or if changed were related to abortion (and they indeed argued that it did not matter whether or not they could prove a demographic change, the regressions showed the link between abortion and crime and that was all that mattered). Even cursory analysis of demographic trends show increases in poor births and single mother births as percentages of total births (those they argued impact crime).

But no matter how strenuous my belief that their work is incomplete, does that mean I know that inclusion of demographic trends would invalidate their results? No. And no number of real, straw or red herring arguments can change the actual, tested result (especially since that is how Donohue and Levitt responded to critics, including but not limited to, “Further Evidence that Legalized Abortion Lowered Crime: A Reply to Joyce”, Journal of Human Resources, 2004).

Second, it is the sign of a desperate person when they start throwing out everything they can think of, regardless of its actual validity (maybe one will stick?). Flowing with Donohue’s argument, if I see that more people eat ice cream in the summer and there is more crime in the summer, does that mean there is any real relationship between ice cream consumption and crime? Maybe we should consider the temperature or the fact school is out? There is logic there to considering temperature or the school calendar. So maybe we combine them into a model. What logic says we need to include the current administration? Do appellate judges really base their opinions on the current administration? How many appellate positions were filled in 2001 and 2002 by Bush? How many changed party? How would that impact reversal rates? There may be an argument there, but since Donohue does not feel compelled to state his case for such inclusion (nor any evidence) he should not argue for its control or make grandiose claims that Barondes is inept for not controlling for it.

Third, much of his argumentation is set up so that, if Barondes did everything he asked for, the study would fail (otherwise known as a straw man:). He picked contentions that he knew could not work. Lets account for judges ages. Well, do we assume that judges get better as they get older or worse? Do they become more open-minded or more conformist? Do they push the boundaries of the law more or rein back in their unbridled liberal passions? Before a variable should be included as a control one should know that there is some type of consistency across the sample in the behavior of that control. My assumption would be that no such consistency exists and, using Donohue’s logic, since Donohue provided no counter evidence I am going to assume my position is statistically correct. Also, suggestion of a panel regression is fool-hearty with the type of sample Barondes has and Donohue is experienced (I hope so) to know this. Compression of the 13,000 cases into a panel consisting of percentage overturn rates would limit the sample to a breadth of 95 (or less) and a depth of 4 years (maybe more if sliced into increments less than annual). No statistician having been awake for more than 5 minutes in class would know that such a small sample would provide no significance for anything if Barondes were to include all the proposed controls (his degrees of freedom would be non-existent). I suppose Donohue would prefer for Barondes to waste 40 hours of his time doing such analysis only to spend 5 minutes writing a pithy blog entry about “How he was right, the results were not significant.”

I write this response not solely to defend Barondes, but to correct the asinine logic of an apparently bitter, defensive law professor. Knowing that he has spent his life promoting empirics and economic analysis in the law I should say that I am ashamed he would knowingly stoop to such levels merely to defend his ivy league pride. In the process he is misinforming the lawyers of tomorrow and cheapening the academic debate about legal education. This is in no way a complete rebuttal, but intended to counter the tone and nature of his argumentation. As a PhD candidate myself I feel I must stand up and say enough is enough.

I am bad at posting comments, so the jist of this may appear previously. Nonetheless: the econometrics/statistics aside, this article points up that judges would be better off taking the best students from the spectrum of top law schools than the entire class from Yale. A middling Yalie is a Yalie nonetheless, but what has that student done to deserve the clerkship over the best students at Michigan, Penn, UVA, Duke, Cornell, Northwestern, etc...? Achievement after admissions ought to count for something.

What's ironic about your post is it confirms the rumors about Yale grads.

You are unable to connect theory with reality, and as a result, you are unable to apply theory.

You suggested a number of theoretical reasons why Barondes' analysis *may* be wrong. That is a cheap attack on the author's hard work. It is inappropriate because it dissuades other authors from trying. It is inappropriate because you attempt to marginalize the hard work he put into collecting and analyzing data with hot air. Simply stated, there *could* be something wrong with any statistical study. An indisputable study is absolutely impossible when you are studying the real world. That's an obvious point and a poor basis for criticism.

Barondes' analysis speaks for itself. No statistician ever claims their results are indisputable, and neither did Barondes. However, it's something. I appreciate the grueling effort required to collect and summarize statistical data. I will certainly not let you brush it aside with hot air.

In summary, you provided a number of theoretical reasons why his conclusion *could* turn out to be in error. But outside of Yale theory is not allowed to trump reality. Reality is allowed to have their say.

As a long time and well credentialed statistician and economist I am truly ashamed of your post. What's amazing is you did it without even attempting to hide your bias. Could you not at least have had someone from another school parrot this for you?

I only hope it doesn't dissuade others from putting in the hard word required to analyze and study data. Don’t worry about the windbags. We appreciate the work you did.

Dear Another Statistician,As a non-statistician, I can obviously see that not controlling for the judge is a major flaw in this paper. The paper purports to explain the effect of hiring a Yale clerk. It fails to do that if it doesn't compare the same judge with and without a Yale clerk. Fancy stats aside, that should be clear.I did find a typo, though. "then we might expect given the high ranking of Yale Law School." Should be "than." You're welcome.

k,As a statistician I should point out that the procedure Barondes used does account for judge effects. The Conditional Logit model used 'clusters' each judges observations together. So the results given are those effects present, conditional on each individual judge. Just because the model doesn't give a coefficient for each judge doesn't mean it doesn't account for each judge.

This is the kind of thing that frustrates me about Donohue's post. Anyone who reads this is going to think that if some statistical procedure doesn't produce output for every control variable under the sun then it is worthless (some models include effects not listed on output, and some controls are simply worthless and unnecessary).

If Barondes were to run a panel model, like Donohue suggests he would need to include variables for each judge, and interaction variables for each judge and time period (to account for his judges change over time argument). That would mean a sample of around 380 and at least 295 variables, not considering all the other appropriate controls. It is simply not feasible. I am sure that Barondes considered such options and appropriately disregarded them. Donohue should not mislead his readers (like k).

When I read this, I honestly forgot what blog I was visiting. I would not expect this type of entry or post at Balkinization. Halfway though the post I thought I was reading a post at that insufferably pedantic, often pretentious blog with the green color scheme, which is written by a bunch of ivy leaguers Anyway, I could not get past Donohue's ruminations about the need to factor in the judge's characteristics---specifically, advanced age. It seems unlikely that a lawyer, let alone a federal judge, would disclose cognitive deterioration. Short of brain imaging, which isn't dispositive anyway, how would one go about measuring the correlation between reversal or problematic judicial opinions/rulings and aging---at least without running into the same problem of reliability that is the subject of Donohue's entire critique of this Yale law clerk study? One cannot assume that aging inherently diminishes judicial decision-making and legal analytical skills. (If so, the USSC might as well close up shop.) At the end of the day, it seems as though the factors/constants/variables that Donohue anecdotally sets forth to demonstrate the unreliability of the study beg the same criticism he doles out, as one is left with the seemingly unmeasurable and unanswerable issue of causation: is it age, or is it the Yale law clerk?

So Yale has a lock on judicial clerkships? And what evidence is there to demonstrate the extent to which the Justices (at SCOTUS) are actually influenced by their Yale clerks (or other clerks)? Is there a suggestion that Yale has SCOTUS by the SCROTUS? And what do originalists have to say about the role and influence of such clerks? Are the Justices tnat naive (or incompetent or lazy) to be taken in by their clerks who have "bubkis" for experience? And is there a suggestion that there is a continuing pipeline between the Yale clerks and the Yale Law faculty? Maybe there is need for a locksmith.