The report was released at a full-bore press conference at the National Press Club, where I got to present my findings, and it generated a modest swirl of news stories across the country. Better yet, it earned me my first and only interview on national television. Well, kinda sorta national television — MSNBC.

But looking back at the report today, re-reading the first few pages, I feel as much sheepish as proud.

I feel sheepish because the report opens with what, in the cold sobriety of hindsight, I can only describe as a naïve, hyperbolic tale about the wondrous transformation available to our nation’s juvenile justice systems if only they would adopt the handful of so-called evidence-based treatment models. (Read the opening passages of “Less Hype, More Help.”)

But reading them only heightened my sense that discussions of evidence-based juvenile justice remain, well, naïve and hyperbolic, just as I was in 2000. I know better now, and in this column I want explain why my unqualified praise for these gold-standard models was misplaced.

Don’t get me wrong. There is an important place in juvenile justice reform for carefully crafted treatment models with hard evidence from randomized trials. And there’s an even more important place for rigorous outcomes measurement and data-driven decision making. But my suggestion that we can revolutionize juvenile justice in this country by replacing the current system with plug-and-play programs was a fantasy back in 2000. And it remains a fantasy today.

This is true for several reasons.

First, no effort to reform juvenile justice should begin with a treatment program, evidence-based or otherwise. That’s putting the cart before the horse, a loser’s bet. Juvenile justice operates as a system, not a collection of programs. While programs are important, even the best program models will come to no avail if they are embedded within dysfunctional systems prone to making bad decisions in untimely ways about how to serve and sanction court-involved young people.

The second problem is definitional: Who decides what is or isn’t evidence-based, and using what criteria? Lists identifying effective, proven or promising adolescent prevention and treatment models have proliferated rapidly in recent years, but — due to the lack of any consensus in the field on how to define “evidence-based” — they vary widely in their criteria for inclusion. Some set the standards of proof so low that many recommended models lack any reliable evidence of effectiveness.

Third, rules requiring exclusive or heavy reliance on evidence-based models necessarily exclude many home-grown or idiosyncratic strategies that are rooted in communities but lack the pedigree of a rigorous, carefully controlled evaluation study. That kind of research is expensive, beyond the means of many community agencies and grassroots organizations that have a keen interest in — and untapped capacity to support — youth in high poverty neighborhoods where most juvenile court cases arise. Such rules can also stifle innovation, which is critically needed given the still-small array of interventions with powerful evidence of effectiveness and our still-limited scope of knowledge of what work best for youth facing different types of risks and needs.

All of these issues pose vexing challenges to the evidence-based movement in juvenile justice. In this column, I want to focus attention on another, less appreciated problem facing the evidence-based programs movement. That is the seldom discussed fact that the research behind even the most highly-regarded intervention models isn’t nearly as strong as many assume (or allege).

Unconvincing Evidence for Prevention Models

In 2006, the editors of Youth Today (a bi-monthly newspaper on youth development now published by the Center for Sustainable Journalism, the same organization that publishes JJIE) sent me to cover a national conference about evidence-based models for reducing delinquency and adolescent substance abuse. Three years later, Youth Today hired me to write a series of columns featuring new research on what works and doesn’t work in youth development.

These assignments gave me a chance to examine the evidence on model programs more closely, not just on juvenile justice interventions but also delinquency prevention, child welfare and other children and youth programs. The more I looked, the more concerned I grew.

For instance, one of my Youth Today columns in 2010 touted a highly-regarded Australian model called “Positive Parenting Program,” or Triple P, a community-wide strategy for reducing child abuse. Unlike other community-wide approaches to reducing child abuse, which yielded “limited or no evidence of effectiveness,” Triple P showed encouraging results in a host of overseas research studies. And a 2009 evaluation in South Carolina found that counties that implemented Triple P had far better results than non-participating counties in terms of overall maltreatment rates, foster care placements and emergency room visits stemming from child maltreatment.

Swayed by these studies, I informed Youth Today readers that the Triple P model “can make a dramatic and cost-effective difference.”

Not so much, it turns out.

In 2012, an independent review of the available research found “no convincing evidence that Triple P interventions work across the whole population or that any benefits are long-term.” In most cases, the available studies were methodologically weak and involved very small samples. And all but one was authored by personnel affiliated with the Triple P model.

Then this spring, University of Cambridge criminologist Manuel Eisner published a working paper delineating “Seven Reasons to Be Skeptical” about the about Triple P study in South Carolina. In 2007, two years before releasing their results, scholars working on the South Carolina evaluation publicly detailed their study design, including the sample to be studied, research protocols and outcome measures. Yet when their final paper appeared, many of these parameters had changed without explanation: a different age range of children, a different time period for comparison, a different unit of analysis. Worse yet, the final study reported on just three of the 11 outcome measures identified in the research plan, and it added a new measure that wasn’t included in the initial research plan.

These kinds of “post-hoc” changes are telltale warning signs in evaluation research, offering easy opportunity for researchers to cherry-pick the data they choose to report. Meanwhile, the Triple P study did not acknowledge any conflicts of interest, as required, even though the study’s first author was a Triple P consultant and the second author was the founder of Triple P and director of a thriving for-profit business dedicated to replicating the model internationally.

Sadly, problems with post-hoc changes, data cherry-picking and conflicts of interest are not limited to Triple P or to child abuse prevention. Over the past dozen years, serious critiques have also been published questioning the research behind several of the most widely touted school-based models for preventing delinquency, substance abuse and/or smoking.

These critiques find that scholarly papers evaluating model prevention programs — typically written by the developers and promoters of the models — have frequently employed dubious methods and selective reporting to justify positive findings.

Among the models whose research has been subjected to sharp criticism are Life Skills Training, the Seattle Social Development Model, Project Alert and the Midwest Prevention Project, all of which have been touted as proven or effective on lists of evidence-based practices maintained by Blueprints for Violence Prevention, the U.S. Department of Health and Human Services, U.S. Department of Education, National Institutes of Drug Abuse,and/or Substance Abuse and Mental Health Services Administration.

The model developers have ardently defended their research, of course, and raised some persuasive points. Yet, to an informed lay observer like me, the critics have the better of the argument: Troublesome methodological anomalies do seem pervasive in the research behind a number of prevention models widely recognized as evidence-based. While these kinds of anomalies do not prove intentional misconduct — unconscious bias is far more likely — the result nonetheless is research that tips the scales in favor of the models being studied and presents an unrealistic and inflated portrait of their impact. (Some of the models have since been downgraded or removed from some lists.)

Dennis Gorman, a scholar at Texas A&M who has written many studies critiquing the research on evidence-based prevention models, concludes that “Much of what goes on in the analysis of these school-based prevention programs is simply not consistent with the type of rigorous hypothesis testing that one associates with the term ‘science.’”

In fact, a few years ago, Gorman and a colleague published a paper about Drug Abuse Resistance Education (or DARE), one of the few models that has been found to be ineffective based on evaluation research. The paper found that, using statistical techniques commonly employed in studies supporting many other models, they could show that DARE, too, was evidence-based — even though any objective reading of the evidence finds that the DARE model has little or no effect on participating youth.

Questions About MST Research

Not surprisingly, questions have also been raised about the research into model programs aimed at reducing crime and delinquency among those already involved in lawbreaking behavior.

In 2005, veteran social research scholars Anthony Petrosino and Haluk Soydan examined 300 evaluation studies of intervention programs designed to reduce criminal behavior. They found that studies conducted by the developer of the model being examined showed large reductions in recidivism (average effect size of nearly one-half a standard deviation), while studies conducted by independent evaluators found an average effect size of exactly zero.

As Eisner puts it, “There is evidence of a worrying pattern in criminological evaluation research and that systematic bias is one possible explanation that we can’t afford to ignore.”

To date, there has been less scrutiny of the research behind the two types of models that have produced strong results in reversing delinquency and other problem behaviors among troubled adolescents: cognitive-behavioral therapy, and family-focused treatment approaches such as FFT, MTFC and MST.

So far as I’m aware, the research behind the leading cognitive behavioral therapy models has not been subjected to an exacting review regarding research methodology or potential bias introduced by model developers evaluating their own models. (As with the prevention models, developers have conducted much or most of the experimental research for several of the leading CBT models.) Likewise, I have not seen any serious methodological critiques of Functional Family Therapy or Multidimensional Treatment Foster Care.

However, Multisystemic Therapy (MST) was examined in a 2005 study by a research team lead by Julia Littell, a social work scholar at Bryn Mawr College. Littell identified a number of weaknesses in the MST research, and after employing a statistical technique called meta-analysis to synthesize the results from multiple studies, she characterized the research on MST as “inconclusive.” The available evidence, she found, “does not support the hypothesis that MST is consistently more effective than usual services or other interventions for youth with social, emotional, or behavioral problems.”

MST developer Scott Henggeler and other scholars affiliated with MST authored a sharp rebuttal, questioning Littell’s analysis and citing their own research showing that weak results in MST are typically tied to lack of fidelity to the MST model, rather than weakness in the model itself. Indeed, studies by MST-affiliated researchers consistently show that results are far better in MST programs with high adherence to the MST model than those with lower adherence, and MST’s sponsors have developed elaborate processes to promote adherence in MST replication sites.

However, Littell remained unbowed in her criticism, countering in a follow-up essay that the treatment adherence measure employed by MST researchers to measure fidelity was ill-defined, and that the other criticisms of her study were replete with “logical and factual errors.”

Littell also noted that Henggeler and his colleagues face an enormous conflict of interest as both the promoters and evaluators of their own model. Citing publicly available data, Littell reported that MST had reaped $55 million in research grants through 2004, and that MST Services Inc., the for-profit enterprise established by Henggeler and his team to support MST replication, collected $400 to $550 in licensing, training and consulting fees for each of the 10,000 families then served by MST each year.

Time for Higher Research Standards

Lacking advanced statistical training, I am not qualified to score the debate between Littell and the MST promoters. If I were a betting man, I’d wager that if implemented carefully and targeted to youth fitting the profile for which it is intended, MST is most likely a highly-effective intervention. I’d be less confident to bet that MST would outperform similar interventions without the brand name (or evidence-based) imprimatur.

Indeed, a comprehensive analysis commissioned by the Center for Juvenile Justice Reform in 2010 found that MST and Functional Family Therapy "fall well within the range of other family programs” and that “some no-name programs produced effects even larger than those found for the model programs.”

The point is not that we should abandon or turn away from the movement toward evidence-based models like MST. That would be foolhardy. The rapid proliferation of empirical evidence about what works and doesn’t in addressing delinquent behavior has been one of the most important and promising developments in juvenile justice in recent times, and the emergence and spread of carefully crafted intervention models backed by scholarly research is an entirely welcome development. Indeed, several states (like Connecticut, Florida, Louisiana and Ohio) have made achieved impressive results by replicating evidence-based models on a large scale — improving youth outcomes, reducing recidivism and saving taxpayers’ money.

But continuing to ignore the valid empirical questions being raised about the research supporting these models would be equally foolhardy. To employ a baseball analogy, we’ve only reached first base in our research about evidence-based juvenile justice. We need many more models, and we need to develop a much deeper understanding of what works, when, for which youth and under what circumstances.

And, critically, we need higher standards for research. We need more transparency.

Fourteen years ago the emergence of models with any evidence of effectiveness was newsworthy, and the spread of such models was paltry. Today, these program models are household names in our field. Together, they have become a growth industry in the field, and they now consume hundreds of millions of dollars per year in state and local funds.

This isn’t little league any more.

A Love Letter to Evidence-Based Practices for Combating Juvenile Crime

(The opening passage of "Less Hype, More Help")

Today, several of the models touted in “Less Hype, More Help” are widely known, such as Multisystemic Therapy (MST), Functional Family Therapy (FFT) and Multidimensional Treatment Foster Care (MTFC). But back then few people had heard of these approaches. Hence my temptation to trumpet them so boldly.

Or part of my temptation. I was also seduced, like many others since, by the aura of precision and certainty these evidence-based models inject into discussions of juvenile justice. My opening passage oozed with the smug certitude often bred by the imprimatur of peer-reviewed science.

I started with a question:

“What if we could take a chronic juvenile delinquent, a kid who has been arrested five, six, 10 times, and instead of sending him away for a year to juvenile prison for $40,000 or $50,000 (only to come home with a 50 to 70 percent chance of re-offending) ... what if instead of that we could keep him at home, spend less than $5,000 working with him and his family over four or five months and cut the likelihood that he’ll re-offend in half?”

And then another:

“What if, for a chronic delinquent who is just too unruly to stay with her parents, instead of sending her to a group home or youth prison we could spend just a little more to place her into a specialized foster home for six to nine months, work with the child and coach her parents and reduce the amount of time she can expect to be incarcerated by 75 days over the next two years?”

And then a third:

“What if, for chronically disobedient elementary school children, we could spend just $1,500 for a two-pronged program — video-based parenting skills training and classroom-based social competence training for the child — and reduce problem behaviors dramatically (by 30 percent or better) in 95 percent of all cases, significantly reducing the number who will be arrested later as juveniles?”

Then came the punchline:

“Well, you can stop asking, ‘What if?’ We can. We can. And we can.”

If only it were that simple.

Related Series

Filed Under:

3 thoughts on “Analysis: Holes in the Evidence for Evidence-Based”

Thanks for the article. Do any of these programs that have been operating since the late 1990s have information showing what is happening today in the lives of youth they might have connected with when those youth were 9 or 10 years old? I see far to little focus on helping youth on the journey to adult hood and jobs, and this influences the funding (or lack of funding) for programs trying to build long-term connections to youth.

What your article emphasizes to me is that the “evaluation” industry has become a multi-million dollar self-interest sector and this concerns me when too few operating dollars are flowing directly to the ground level organizations working with youth.

I agree with Judge Teske, this is a great article! Evidence-based programs are not perfect and in many ways they have not transformed the field as once hoped, but they do work well when implemented well and have overall been a positive development in our field. And, yes, a well-functioning juvenile justice system is also a necessary condition for success. You mentioned Florida as one of the Success Stories (and linked readers to EBA’s Redirection Project there), bringing up what I believe is the third necessary condition — rigorous, thoughtful, and collaborative implementation. In Florida, EBA managed the implementation of a statewide juvenile justice reform effort, in partnership with Florida’s DJJ, that focused on the rigorous implementation of evidence-based programs. (That project yielded stellar results including a 19% decrease in felony offenses in the year following treatment (compared to a matched control group) and over $170 million in cost-savings by safely diverting the highest risk youth on Probation away from residential placement.) If any of those three conditions are a ‘zero’ — the end product is (or can be) another ‘zero.’ All three together can produce a great outcome, but if any of the three are lacking, the results will not be those that are hoped for.

One final thought re your point about investing in more research: Given the reliance on public monies, states and communities should treat their investments in juvenile justice programs and services like an investment portfolio: invest heavily in what we know works (our ‘blue chips’) but – yes! – save some funds for both the emerging programs (think: ‘higher risk/higher reward’) and for residential facilities (a little like ‘hedge funds’ for the times when all else fails). We can argue about the ideal numbers/investment percentages, but our current investment portfolios are still heavily tilted toward residential facilities, and we need to find ways to shift the funds toward the programs and services that can provide ‘blue chip’ ROIs.

Great article! I have written in previois OP-Eds that relying on EBP only in a dysfunctional juvenile justice system will yield poor to average results. Its like good and well intentioned people working in a broken system–they too will risk making broken (poor) decisions. Of course, your aricle speaks more to this complex matter than systems alone, and you do a good job revealing this multi-dynamic problem of EBPs, but even if true to the fidelity of the program I am hard pressed to believe that a system will yield the best results if that system is not functionally operable–in more ways than one. Thank you.