Over the past thirty years, the school discipline pendulum has swung wildly from one extreme to the other, as policymakers have struggled to solve an inherently difficult problem. Today, the “zero tolerance” policies that were all the rage at the end of the last century are generally viewed as heavy-handed and blunt, removing administrator discretion and treating many different kinds of offenses as equally injurious. Yet as the tide of elite—and education reform—opinion has turned against over-suspension, the instinctive response of policymakers has once again been to tie the hands of teachers, principals, and local officials, this time with the explicit goal of reducing the use of suspensions, especially for traditionally disadvantaged groups.

Overall, we agree that suspensions are unlikely to benefit suspended students. But an important question about school discipline is also whether the push to reduce the number of suspensions is harmful to the rule-abiding majority. According to a 2004 study, 85 percent of teachers and 73 percent of parents felt the “school experience of most students suffers at the expense of a few chronic offenders.” And that was before the push to reduce suspensions. A more recent study by the Manhattan Institute’s Max Eden showed that the percentages of students and teachers in New York City reporting drug use, gang activity, and physical fights rose dramatically in the years following discipline reforms initiated by Mayor Bill de Blasio, which require that principals obtain written approval from the city before suspending a student for “uncooperative/noncompliant” or “disorderly” behavior.

To examine these matters further, we enlisted Matthew Steinberg, assistant professor of education at the University of Pennsylvania, and Johanna Lacoe, an experienced researcher at Mathematica, to conduct Fordham’s new study, The Academic and Behavioral Consequences of Discipline Policy Reform: Evidence from Philadelphia. In it, they examine outcomes in the School District of Philadelphia (SDP), which changed its code of conduct in the 2012–2013 school year to ban out-of-school suspensions (OSS) for low-level “conduct” offenses—such as profanity or failure to follow classroom rules—and reduce the length of OSS for more serious infractions.

Below is a summary of what they found:

Changes in district policy had no long-term impact on the number of low-level “conduct” suspensions, and most schools did not comply with the ban on such suspensions.

Changes in district policy were associated with improved attendance—but not improved achievement—for previously suspended students.

“Never-suspended” peers (i.e., students who didn’t receive a suspension in any of the years considered by the study) experienced worse outcomes in the most economically and academically disadvantaged schools, which were also the schools that did not (or could not) comply with the ban on conduct suspensions.

Revising the district’s code of conduct was associated with an increase in racial disproportionality at the district level.

Based on these findings, we draw three conclusions:

First, schools may respond very differently to district mandates, depending on their demographics, achievement levels, and prior suspension rates, as well as other factors bearing on policy implementation and compliance. In Philadelphia, the “toughest” schools tended to ignore the district’s ban on conduct suspensions, while the highest-achieving schools appeared unaffected—because they didn’t have any conduct suspensions to begin with. Finally, the schools in the middle stayed lukewarm, meaning they reduced their suspensions but (in most cases) didn’t eliminate them entirely. In other words, schools responded (or didn’t) much as you might expect them to, given their pre-existing challenges.

Second, top-down mandates can have unintended consequences—even when they emanate from local decision makers rather than distant state or federal governments. In Philadelphia, never-suspended students in many schools, including most schools that reduced their suspension rates, experienced a decline in academic performance relative to the most plausible comparison group. And the district-wide decline in conduct suspensions coincided with a suspicious increase in the number of minority students suspended for more serious infractions (though it is impossible to know if less serious offenses were reclassified as a result of the policy change). Clearly, these are not the responses that district leaders intended.

Third, policymakers should respect the wisdom of practitioners when it comes to school discipline. For us, the biggest lesson from Philadelphia’s experience is that “discipline reform”—however defined or conceptualized—is best initiated at the school level rather than the district level, where the law of unintended consequences is more apt to prevail. Suspensions may have costs for suspended students, but these must be balanced against the necessity of maintaining an orderly learning environment. And the individuals best positioned to make those judgment calls, and to gauge how effective future approaches to discipline may be, are those on the front lines. Teachers and administrators who are struggling to manage disorder cannot be expected to comply willingly or well with a directive that eliminates one of their most important tools.

Overall, the report’s findings speak to the stubborn realities that educators must contend with, which brings us to our final point: a plea for the removal of the rose-colored glasses that so many observers and critics seem to don when viewing school discipline. Everyone knows that changing a district’s policy on suspensions is unlikely to address the underlying issues in tough schools—or peaceful ones. So viewing this as a civil rights issue and trying to fix it with top-down decrees is impractical and potentially harmful, whether those decrees emanate from the district, the state, or the banks of the Potomac. If the goal is finding more effective ways to build a safe and strong school culture, it is far better to work with staff in high-poverty schools than to imply that they have racist tendencies and may be deliberately violating students' civil rights.

We harbor no illusions that this study will put an end to the discipline debate. But we hope it will inject a measure of nuance into the conversation—and perhaps help the discipline policy pendulum to find a more stable resting place somewhere in the vast and sensible middle ground.

Last week, an NPR affiliate threw super-cooled water on D.C.’s Ballou High School’s so-called success in graduating 64 percent of its seniors and earning every senior, regardless of whether they graduated, college acceptance. Turns out that, although 164 students were granted diplomas, more than half of those who walked across the graduation stage tallied at least sixty days of unexcused school absences, and one cap-and-gown wearer recorded more than 150.

As shocking as these seat-time revelations are, they merely add to other signs of academic struggle at Ballou. Last year, for example, only 9 percent of the school’s pupils passed the English language arts portion of D.C.’s annual standardized test. None passed the math portion.

Perhaps even worse, considering that all of these young people may now be college-bound, are the school’s woeful SAT scores. Last year, the average total score was 782 out of 1600, which falls into the 11th percentile nationwide, and 8th percentile among SAT users. That score represents the sum of the exam’s two “objective” parts—essentially reading and math—and Ballou’s scores were equally low in both, earning a 382 and 381, respectively. Neither is remotely close to the exam’s college-readiness benchmark. The College Board, which administers the test, communicates a score’s distance from that benchmark with a series of colors: green, yellow, and red. “Red” means that an SAT score is “below the benchmark by more than one year’s academic growth.” The color’s cutoff in reading and writing is 450; in math it’s 500.

Ballou is far from a unique case. Rather, it’s a recent and extreme example of a nationwide issue. “This is sad and infuriating and, as local education reporters across the country know, not at all uncommon,” tweeted Erica L. Green, an education reporter at the New York Times. Indeed, just weeks before the Ballou story broke, a report from D.C.'s neighboring Prince George's County, Maryland, found that 25 percent of that district’s high school graduates may not have met requirements.

So what in the world does a diploma now indicate? Not that students are ready for college. Not that they’ve learned as much as twelfth graders should learn. Not even, apparently, that they’ve shown up for school. Is it nothing more than a redundant certification that these graduates are old enough to finish twelfth grade and enter some version of adulthood? Has the high school diploma therefore lost all academic meaning? There’s a good chance that it has. And in losing all meaning, the farcical meaning that schools purport to assign to it harms our most vulnerable, disadvantaged students, especially now that America's high school graduation rate is at an all-time high.

When a young man or woman who is sorely unprepared for college enters college, that person has but a faint hope of succeeding. And when they fail, they’re worse off than had they never gone at all. At the very least, time that could’ve been spent building a career is wasted. And that cost is often compounded by serious financial debt in the form of loans for tuition, room, and board. When the person then leaves college without skills, a degree, or other credentials and struggles to find a job, interest on those loans accrues, and soon the young person is drowning in oppressive debt.

This, of course, disproportionately affects disadvantaged and low-performing students. Just 40 percent of black enrollees at four-year public universities graduate within six years, for example, 21 percentage points below their white peers, and 18 percentage points below the nationwide average. And it’s far worse for students who must take remedial courses because they aren’t prepared for college classes when they arrive on campus. Fifty percent never complete their remedial education, and less than 25 percent who attend community college earn a degree within eight years.

In short, the situation has grown absurd and untenable. We’ve simultaneously lowered the bar for high school graduation while pushing for more high school graduates to attend college. Things, therefore, must change.

First, let’s once and for all eradicate the soft bigotry of low expectations. Understandable is the urge to make exceptions for struggling students while they’re in school. But sooner or later they must enter the real, unforgiving adult world. And when we tell these young men and women that they’ve developed the skills to succeed when they, in fact, have not, we’re causing serious long-term harm for the very people we’re trying to protect in the short-term.

Second, stop pretending that college should be for everyone. Career and technical education is a promising, underrated, and underutilized avenue in our schools. Make more of those options available to students, and show our young people that there’s nothing wrong—and a lot right—with pursuing a vocation. Being a lawyer, for example, isn’t an inherently better profession than being an electrician or a carpenter. And, despite at least seven years of higher education, lawyers often make less money.

Third, rethink the high school diploma. Base it on demonstrated competency rather than time in school or Carnegie units compiled. Or consider, as my colleague Checker Finn has suggested, instituting a multiple-tier system in which college-bound students receive, say, “academic” diplomas, and those who are career-bound get “applied” diplomas that signal more practical things, like responsibility, reliability, or on-the-job skills. This would not be tracking by a different name; both options would have a whole lot in common, and every student would have the option to choose either at any time.

Fourth, stop holding schools accountable for easily gameable and therefore mostly meaningless metrics like graduation rates. When you condition adults’ livelihoods on whether students walk across a stage wearing a cap and gown, more and more students will do so, regardless of academic accomplishment. Focus instead on measures that are more representative of knowledge and success, like academic growth measured by high quality assessments.

Steps like these will help eliminate the perverse incentives that culminate in outcomes like Ballou’s. American education should prepare our kids to be well-adjusted, successful adults. That goal should inform every decision and should always be considered when we examine the direct and indirect consequences of our policies. In failing at this we’ve failed our students and ultimately failed the country they will inhabit, too.

On this week's podcast, special guest Kim Smith—CEO of the National Charter Collaborative—joins Mike Petrilli and Alyssa Schwenk to discuss single-site charter school leaders of color. During the Research Minute, Amber Northern examines Raj Chetty’s new “Lost Einsteins” study, which finds that smart low-income kids are much less likely than their affluent peers to grow up to become inventors.

A new recent study conducted by David Blazer of the University of Maryland examines whether teachers affect student outcomes other than test scores, including students’ self-reported behavior and happiness in class and self-efficacy in math. The study collects data from fourth and fifth grade teachers in four anonymous school districts in three states on the East Coast across three school years (2010–11 to 2012–13).

The analysis focuses on a subset of forty-one teachers who were part of a random assignment study in year three and a group of students (and their teachers) who completed a survey about their attitudes and behaviors during all three years. Analysts had access to student demographic and achievement data, teacher value-added data, and student survey data on three constructs, behavior in class (e.g., “My behavior in this class sometimes annoys the teacher”), self-efficacy in math (e.g., “In this class, math is too hard”), and happiness in class (e.g., “I enjoy math class this year”). Regarding the causal nature of the study, in the spring of 2012, fourth and fifth grade teachers were randomly assigned to class rosters of the same grade level; participants were generalists who taught all subject areas such that their contribution to student outcomes would not be confounded with the effect of another teacher.

The findings can be boiled down to two key results. First, teachers substantially affected all three self-reported measures of student attitudes and behaviors. The largest of these effects was on students’ happiness in class, for which a 1.00 standard deviation (SD) increase in teacher effectiveness led to a roughly 0.30 SD increase in that outcome. Further, the magnitude of teacher effects on behavior in class and self-efficacy in math was generally larger than teacher effects on students’ math performance but, again, smaller than teachers’ effect on student happiness.

Second, in a different model, there’s a small but negative relationship between teacher effects on students’ math performance and teacher effects on happiness in class. Blazar suggests that “teachers who are skilled at boosting math achievement may do so in ways that make students less happy or less engaged in class.” That’s not terribly surprising considering we’ve all taken a class that taught us a lot but wasn’t the most exciting or enjoyable learning experience in the world (Mr. Vanorden’s tenth grade Geometry class comes to mind).

The study ends with a warning that student survey data on non-cognitive outcomes like these are not appropriate for official accountability systems but can certainly inform areas where teachers might need additional training or professional development. In the end, the report contributes to our knowledge of how to gather and make sense of richer measures of student outcomes like attitudes, behaviors, and engagement, in addition to test scores. It seems like the entire field is echoing the need for such measures—and thankfully we’re makingsomeheadway.

A recent study from the Education Research Alliance at Tulane University uses thirteen years of student-level data from Louisiana to examine differences in suspension rates for black and white students, as well as poor and non-poor students. Overall, it finds that black students are about twice as likely as white students to be suspended and low-income students are about 1.75 times as likely to be suspended as non-low-income students. However, as with previous studies of this topic, it is difficult to know whether (or to what extent) these gaps reflect educator bias, as opposed to differences in behavior or school culture.

According to the authors, suspensions for black students in Louisiana last an average of 0.40 days longer than suspensions for white students who commit the same type of infraction. This difference could be interpreted as evidence of bias (or at least systemic inequity). However, when comparisons are restricted to students in the same school, grade, and year, the difference between black and white students is just 0.10 days. So, at the very least, the first estimate overstates the bias exhibited by individual educators.

Further complicating matters, since suspensions for low-income students are 0.18 days longer than suspensions for their high-income peers, even treating these smaller within-school differences as evidence of bias implies that educators harbor a particular animus against the poor (which seems strange, since poverty is less visible than race). Then again, unlike other studies (which have typicallyfound that differences between schools account for most socioeconomic and racial disparities), this one finds that “within-school differences account for at least 50 percent of the black/white and poor/non-poor gap in kindergarten and grades 5 through 12.” It’s not clear what explains this difference. But assuming it’s real, the implication is that discipline gaps are (at the very least) more “visible” to students and teachers in Louisiana than in other places.

Uncomfortable yet? Well buckle up because we’re just getting started.

So far, most of what I’ve written about this study could probably have been written about any number of discipline studies over the years. Consequently, in an effort to advance the conversation, the authors go on to examine fights between one black student and one white student, which in their view represent “a very particular setting” where disciplinary “disparities most likely would reflect discriminatory school discipline practices” (emphasis added). As the authors acknowledge, even this assumption is contestable. After all, it’s possible that black and white students behaved differently in these fights—and thus warranted different punishments. (Which kid started it, for example?) But, for the sake of argument, let’s assume that the authors have succeeded in putting a lower bound on the racial bias of Louisiana educators.

After controlling for student demographics, the school the students in question attend, and the number of prior fights each was involved in during the year in question (to address the possibility that students who are involved in multiple fights are disciplined differently), the authors estimate that black students receive suspensions that are an average of 0.05 days (or just 1.6 percent) longer than those handed out to the white students. In other words, they receive one additional day of suspension for every twenty interracial fights.

Frankly, if that number accurately quantifies the amount of racial bias in Louisiana, then we should all start talking about something else (like our criminal justice system). After all, only a quarter of black students in Louisiana are suspended in a given year, and less than a third of these suspensions are for violent offenses, so the “average” black student in the state is probably suspended for about one violent offense per lifetime. In other words, if we assume that the bias detected in interracial fights applies to violent offenses in general, then by the end of grade twelve the average black student in Louisiana is suspended for about 0.05 days longer than she should be on account of violent offenses.

Is a student who gets in one fight going to miss out on college because she is suspended for an additional twenty-five minutes? Will the life outcomes of a student who gets in twenty fights be significantly altered by one additional day of missed classes? And what about the life outcomes of his classmates?

Of course, it’s possible that this estimate really is a “lower bound” that understates the racial bias of Louisiana educators. But even if that’s the case, how is the “true” extent of that bias to be disentangled from the other factors that determine suspension rates? And, more to the point, what is the appropriate remedy—other than a less racist society?