Through the lens of social science, eduwonkette takes a serious, if sometimes irreverent, look at some of the most contentious education policy debates. (Find eduwonkette's complete archives prior to Jan. 6, 2008 here.)

When Measuring Achievement Gaps, Beware the Proficiency Trap

Though we can thank the No Child Left Behind Act for drawing our attention to the "achievement gap" - which is now loosely deployed to reference gaps between African-American and white/Asian, poor and advantaged, suburban and urban, or even male and female kids - it's also done us a great disservice by distorting the way that we measure, and think about, differences between groups.

There are at least two ways of thinking about the relationship between achievement and kids' life chances. The first is to consider, in absolute terms, the set of skills that students have. The second views achievement as relative. Most coveted opportunities - jobs, college admission, a good grade in a college course, or positive evaluations in the workplace - are not divvied up based on students crossing an arbitrary line of proficiency or competence. We don't give everyone a job who's passed a basic reading test, nor do we admit everyone to UC-Berkeley who's received more than a 700 on the verbal SAT. Every student in a college course at NYU can't get an A, and faculty measure students' performance against others to assign grades. In short, all of these decisions are made by comparing the performance of those in a pool, and choosing those who come out near the top.

The proficiency view, to my mind, is certainly important to consider when we are thinking about building stocks of human capital. But if we are concerned about inequality and social stratification - ensuring that, on average, every demographic and socioeconomic group is equally prepared to compete in higher education and the workplace - relative achievement measured on a continuous scale is what matters, not proficiency rates.

Which brings us to how we currently measure "achievement gaps" between social groups, and why this method is tragically flawed. For example, if you look at the NYC press release from this year's test scores, you'll see that gaps are defined as the difference between the percentage of students that are proficient in each group. If the gap in proficiency between black and white students was 29 percentage points last year in 4th grade ELA, and now is 26 percentage points, we hear that the gap has narrowed by 3 percentage points. But it's possible that the gap in the achievement that matters - the continuous measure of achievement - has actually grown.

Let me give a brief example to illustrate. If we use the proficiency logic, the achievement gap that separates the Bronx and the affluent suburbs of Westchester is closing. And indeed on Monday, Mayor Bloomberg crowed that NYC is catching up to the suburbs. If we take a look at 7th grade math, we see that there was a 30 percentage point Westchester/Bronx gap in proficiency in 2007 (73% versus 43%), but this year, there is only a 25 percentage point gap (83% versus 58%). If we use a proficiency measure, the achievement gap has closed by 5 percentage points.

Not so fast. The achievement gap, if we measure the differences in the average student scores in Westchester and the Bronx, has actually increased in 7th grade math. The scale score gap was 28 points last year. Put differently, the average Bronx 7th grader scored at the 23rd percentile of the Westchester distribution in 2007. This year, the gap was 30 points. Now the average Bronx 7th grader has dropped to the 21st percentile of the Westchester distribution, even though the achievement gap, as measured by proficiency, is closing.

Take-home point: when you hear about achievement gaps closing based on proficiency scores, beware of what you're being sold.

Categories:

Tags:

18 Comments

Have you looked at the CEP study. It seems to be doing exactly what you are suggesting (which also is a much more sensitive measure) by using changes in scale scores over time rather than the Y/N proficiency rating. They find changes at the national level. I looked at their summary of NY state--but it looked as though there wasn't much comparative information available--due to test revisions in 2004 (?) or so.

The problem with measuring relative achievement of groups on a continuous scale is those pesky, seemingly intractable, group IQ differences. (IQ being an even better predictor of academic success than SES.) As long as the group IQ differences persist (and there is no indication that they are going to disappear any time soon), it will be close to impossible getting the achievement gaps to decrease. If anything, improving instruction will serve to increase the achievement gap, if you measure proficiency as you are suggesting.

Measuring proficiency rates is far from ideal, but at least it does not suffer from the impossibility of closing the "achievement gap" inherent in the "continuous scale" method. With the proficiency rate method you can get all students to pass some minimum proficiency cut-score, thus closing the "achievement gap" and, at least, masking the IQ differential.

Your Bronx/Westcheater example is not as revealing as you seem to think. You are playing with the noise in the distributions and are dealing with achievement levels (for both distributions) having high failure rates. In order to show a substantial decrease in the achievement gap (say, at about 5 points), you'd need to have very high (or very low) pass rates for both distributions. (You can simulate this by taking two normal distributions with a one standard deviation differntial and playing with the cut scores from 0 to 100).

Caution is needed in evalutaing any claim related to educational achievement, not just achievement based on proficiency rates.

Caution must be exercised when comparing scale scores as well. In the CEP study, it appears the researchers simply subtracted the scale scores from successive grade levels, then divided the difference by the standard deviation. Unfortunately, the Texas test is not vertically scaled, thus any straight subtraction of scale scores results in a misleading indicator of progress. For example, a student answering all questions incorrectly in grade 4 and then grade 5 in the following year can have an increase in her/his scale score.

Some quick and incomplete responses as I am doing my best to get a paper back out today:

Margo - I haven't looked at the CEP report yet, but am looking forward to when I finish this article!!

Rory and Ken - I really disagree that the achievement gap is the result of an IQ gap, if by IQ gap you mean intractable differences in the mean IQ scores between races. Where's the evidence for an intractable IQ gap by race?

Ken - I don't understand what you mean by, "the impossibility of closing the "achievement gap" inherent in the "continuous scale" method" unless this is a function of your inferences about the race/IQ relationship. We're not trying to get everyone to perform at the same level, but to get group averages to roughly the same point. Within each racial group, there will always be a distribution of achievement. Let me know if I've totally misunderstood you here...

I also do not understand your point about Bronx/Westchester. There *was* a significant decrease in the achievement gap as measured by passing rates by anyone's standards - 5 percentage points. Unfortunately, the gap has increased when we look at average scale scores - why do two indicators moving in different directions represent noise?

Ed - Great point - scale scores aren't perfect, and these tests really aren't scaled the way we should be for measuring growth across grades.

· Significant gains in IQ are found, which are largely maintained through third grade. Students entering the program with IQ's over 111 do not lose during the Follow Through years, though one might expect some repeated regression phenomena. The low-IQ children, on the other hand, display appreciable gains, even after the entry IQ has been corrected for regression artifact. Students with IQ's below 71 gain 17 points in the entering kindergarten sample and 9.4 points in the entering first-grade sample; gains for the children entering with IQ's in the 71-90 range are 15.6 and 9.2, respectively (Gersten, Becker, Heiry & White, 1984).

EW, IQ is not intractable, at least that's not how I read the various adoption and twin studies. There does appear to be a window of plasticity for young children that begins closing by about age 7 and appears to end by about age 12 when it mostly returns to the genetic baseline. (See the Colorado Adoption Project) But this really isn't the issue. In order to fully close the achievement gap by 11th grade, you'd need to fully close the IQ gap (plus deal with any stray SES issues that may remain). I don't think we have any evidence that we are capable of doing that. Even if I conceded that the IQ gap were, say, 50% tractable (and there is no evidence that it is that tractable past age 12), we'd still be left with a large achievement gap, regardless of how we measure proficiency.

We're not trying to get everyone to perform at the same level, but to get group averages to roughly the same point.

Right. Is there a way to do this without equalizing IQ differences? SES won't do the trick, since IQ and achievement differntials persist from the bottom to the top of the SES spectrum.

My point about Bronx/WC is merely that as long as group IQ differences persist, the achievement gap will not approach closing until the Bronx student's performance hits close to 100%. Both populations have a long way to go to hit that level.

Rory, be careful with that interpretation of the change in real IQs based on achievement test results. The Follow Through students also took the Raven's Progressive Matrix test (whichis highly g-loaded) and IQs did not increase significantly. A better interpretation of Follow Through is that the DI intervention was capable of clarifying instruction such that low-IQ were capable of learning at about the same pace as the high-IQ students (they did have more instructional time and better teachers) which was faster than the learning pace of students in the control group and other experimental interventions.

The CEP study is great. But I have a simple "tought experiment." What if they did not change a word, but they had issued three different reports?

The same evidence explained in the same way would have produced a report on elementary education that was much less hopeful than reports based on State scores, but NAEP data would justify some cautious optimistism.

The same words and data about 8th graders, who have spent most of the school lives under the NCLB regime, would indicate that gains were not sustained into middle school. Reading scores actually declined, and the Achievement Gap for Math actually widened.

Then the third report would have shown that NCLB has been even less successful for high school.

It would take a fourth report to address the damage that was caused by those policies.

That raises another question, what would Karl Rove do? Why do scholars with intgrity, such as those who wrote the CEP report, just lay out the methodology and evidence? Why don't the accountability hawks show the same integrity? Why have they borrowed the scourched earth policies and the politics of spin of the Bush administration?

One of my worries about the emphasis on "proficiency" -- and the lack on emphasis on anything above proficiency -- is the unintended consequence of creating a two-tier, mostly segregated, educational system. Public school teach poor kids basic skills, and parent who want more than basic skills try to figure out how to get their kids into private schools -- or, if they can, move to affluent suburbs.

Now, public schools that teach poor kids basic skills are better than public schools that don't teach poor kids basic skills. But in my district -- which has an interesting demographic mix -- there's a clear tension between the "let's make sure everyone's proficient before we think about anything else" point of view, and the "we need to make sure each kid makes a year's progress every year" point of view.

And it's pretty clear that if parents get the idea that no one at a school is interested in much besides proficiency, you start losing the proficient kids to private schools and charter schools -- which then exacerbates the social inequality that "closing the achievement gap" is supposed to end.

John, you're playing fast and loose with the data. The first cohort that began school under NCLB is in about fifth grade this year.

I don't see how middle schools could have improved up to this point since they are only just now receiving the first cohort. They're still doing remedial damage control on all the students they still receive from dysfunctional elementary schools.

The high school analysis is the same.

So exactly what damage has NCLB done, besides perhaps being ineffective?

So exactly what damage has NCLB done, besides perhaps being ineffective?

I would argue that the "nothing above proficiency means anything" mindset can do a lot of damage, and I worry about its effect on my daughter's school.

And though I know there are people who argue NCLB hasn't caused a narrowing of the curriculum, I've watched it happen with my own eyes, as our district middle schools -- both in program improvement -- drop electives in order to teacher the math and English "support classes." And I was shocked to discover that the district considers middle school science to be an elective.

If NCLB was really effective at raising math and English proficiency, it might be possible to convince me that it was worth the costs. But I'm not there yet...

I'm using 2002 as the starting point because two major reports in the last week used that date. Even though 2003 would be better, I'm just following the flow of this debate, without even noticing that 2002 helps my case. (and it helps my case for reasons unrelated from NCLB; it was the economic downturn that caused the educational downturn, but to acknowledge that would be tantamount to fighting fair in an educational debate.)

So OK, just between you and me, the middle school kids spent five years under NCLB as opposed to the 4th graders spending one year under the law's testing regime. (I don't oppose much of anything of NCLB except its testing regime.) At any rate, I'm glad that proven experts have shown more evidence of good things coming out of the law in regard to elementary.

I'd like to know how this data reflects real improvements and why, but I'm not qualified to judge on that. I'll just follow the elementary debate and try to learn.

I am presenting a high school teacher's perspective on the evidence, and I have a lot of confidence in my appraisal of the evidence on secondary schools.

The bad things are the narrowing of the curriculum, the excessive test prep, etc. You know what the issues are. You know what the arguements are about the destructive elements of policies encouraged by NCLB. We can debate whether they are actually destructive, but don't do the educational equivilant of denying Newton's Laws.

I don't understand the narowing the curriculum argument. How did you determine that pre-NCLB the amount of time devoted to non-math and reading subjects has been optimized and should be set in stone? Was it the stellar NAEP science and civics scores? Why isn't the narrowed curriculum the optimized time considering that MAEP scores haven't decreased?

John, I'm sure you understand how education works. You can't expect high schools to improve until their incoming ninth graders come in at grade level. Similarly, you can't expect middle schools to improve until their incoming sixth graders come in at grade level. This means you have to fix the elementary schools first. I think it's going to take a bit more than one cohort to accomplish that task. So blaming the failure of middle and high schools to improve on NCLB is a bit premature at this point.

I am neither denying or admitting damaging results from NCLB, I am merely saying that you have failed to prove your point that damage has been done.

I don't understand the narowing the curriculum argument. How did you determine that pre-NCLB the amount of time devoted to non-math and reading subjects has been optimized and should be set in stone?

This is clearly a more substantial argument than fits in blog comments, and gets to the purpose of education.

Is the purpose of education to score well on math and English standardized tests? If so then the narrowing of the curriculum isn't a problem as long as it causes scores to increase, and doesn't have nasty side effects like increasing the drop out rate. I think the jury is still out on both those, particularly at the middle school and high school level.

But though admittedly this is a value judgment, and maybe a romantic, middle class one, I don't think an education that focuses almost exclusively on math and English K-8 is likely to produce educated citizens. Nor is it likely to close the achievement gap (as opposed to the math-English proficiency gap) because parents with the means to opt out of that sort of education for something richer will do so.

Is the purpose of education to score well on math and English standardized tests?

One purpose of education is to acquire knowledge and the acquisition of that knowledge should be demonstrable on a well-designed testing instrument -- all the better if it is standardized since results would be comparable.

I don't think an education that focuses almost exclusively on math and English K-8 is likely to produce educated citizens.

Now you're attacking a strawman. Few schools focus exclusively on math and reading. Only 16% of schools made any cuts and total cuts for these schools was 52 minutes a week.

One thing that does cause large number of citizens to not become educated is failing to learn to read and do basic math. Apparently, it requires quite a bit of time teaching low-SES kids with low language skills to read if the criterion is the student actually being able to read when instruction is over -- a criterion that, sadly, is often not met. To the extent these kids need more instructional time learning to read, they should be given it without prejudice.

Nor is it likely to close the achievement gap (as opposed to the math-English proficiency gap) because parents with the means to opt out of that sort of education for something richer will do so.

This is a false dichotomy. Students who need additional instructional time should get it. Those that don't should be spending that time more profitably on other subject matter. There shpuldn't be a need to opt out of anything, if the schools are doing their job right in the first place.

Rachel: Your comments are right-on: this is exactly what I see happening. It is a false choice to forgo quality differentiation for the lowest-common-denominator situation you described. But I also think that the problem is such that without a MASSIVE social/educational/policy transformation, I think we are stuck with this false choice.

In essence, we are trying to do in the classroom what we cannot achieve outside it: namely equality. I don't have good numbers, but my guess is that SES stratification outside the classroom, in neighborhoods, should be closely in-line with academic stratification within it.

Which generally makes differentiation that much more difficult. When you have such overwhelming and sustained numbers of children coming in to Kindergarten so far behind, with limited resources, logistics requires that you make sacrifices. The child at or above grade level is simply not going to get the same attention/exposure/etc. in an 80% ELL, low-skill population as they would in a population of 80% academic peers.

But alas, isn't this the great secret of educational policy... nay, social justice in general? That what would it really look like if everyone DID start off (or at least quickly achieve) equal footing? Hey - I'm as eager as anyone to find out.

But until we get serious, crack babies and Cabernet babies just won't be able to compete.

The achievement gap is linked to the IQ gap. See the Dreary study discussed on Gene Expression:

"Deary took the analysis a step further however and did a little latent variable modeling. As the IQ test had three components/subtests (verbal, nonverbal, quantitative), he correlated a latent g factor with a latent academic factor using the following subtests: English, English Literature, Math, Science, Geography, French (n=12519). The correlation between the latent factors was .81. That is: 66% of the variance in latent (general) academic achievement can be explained by latent cognitive ability---measured 5 years previously. While he hypothesizes that such things as "school ethos" and "parental support" are good areas to search for the other 34%, based on Rohode's work, it is likely going to be found in residual, first order factors (see Carroll or McGrew).

Take home message: While general cognitive ability and academic achievement are not isomorphic, the former is necessary for the latter, while the converse is not necessarily true. Spearman suggested this more than a century ago, and, to quote the last sentence in Deary's work - These data establish the validity of g for this important life outcome."