Archive for category Assessment

The paper “The Problem With “Proficiency”: Limitations of Statistics and Policy Under No Child Left Behind” by Andrew Dean Ho published in Educational Researcher, Vol. 37, No. 6, pp. 351–360 (August/September 2008), makes the salient point that interpretation and use of cut scores, in particular the percentage Proficiency scores (PPS), gives inherently misleading accounts of progress and comparisons among groups.

The problem is not caused by intentional bias on the part of psychometricians or those determining the cut scores, though Ho does imply it could be used, and perhaps has been used, by those knowledgeable of this problem to give biased results.

In the following I summarize the problem he describes and later reference a Mathematica CDF file I will design and make available that simulates, dynamically, the effects of cut score placement and test score distribution changes to illustrate the principles discussed.

“The Percentage of Proficient Students (PPS) is a conceptually simple score-reporting metric that became widely used under the National Assessment of Educational Progress (NAEP) in the 1990s (Rothstein, Jacobsen, & Wilder, 2006). Since 2001, PPS has been the primary metric for school accountability decisions under the No Child Left Behind (NCLB) Act. In this article, through a hierarchical argument, I demonstrate that the idea of proficiency— although benign as it represents a goal encourages higher order interpretations about the progress of students and schools that are limiting and often inaccurate. I show that over-reliance on proficiency as a reporting metric leads to statistics and policy responses that are overly sensitive to students near the proficiency cut score….” Ho, p. 351.

Assume Cut1 is at -1.5, and Cut2 is at 0.5. Then imagine two different scenarios. In the first scenario, assume the student 4th grade math WKCE scores fall at Cut1 for first year 1, and in year 2 the 4th grade class scores are 0.5 stdev higher. Visually, the cut score holds its position on the horizontal line while the whole bell curve shifts to the right by 0.5 stdev. A comparison between year 1 and year 2 would show a 4.4% improvement as this group would move into the next category. Cut1 would now strike at the -2.0 stdev point. Note also that no improvements will be recognized for about 95% of the students, yet all improved by the same amount.

In the second scenario, the students scores fall instead at Cut2, the first year, and in the second year, Cut2 falls at the 0.0 stdev point due to the 0.5 stdev improvement. In this case the year 1 and year 2 comparison would show a 19.1% improvement, 81% showing no improvement.

These scenarios show that for the identical improvement, that is all kids improved by 0.5 stdev in both scenarios, the perceived improvements show mediocre in scenario 1 and spectacular in scenario 2. Neither interpretation is wrong, but the results are grossly misleading. If one makes teacher evaluations, or AYP decisions based on either scenario, then the resulting consequences will be wrong in both scenarios.

The key understanding one needs to take away from these scenarios is for any cut score like Cut2 sitting right of the bell curve peak (the mean, median and mode), as the bell curve shifts smoothly right comparisons will make it seem that the schooling is getting better by the year, that the schools and teachers have found the magic solution. This will be a false conclusion. Once the bell curve moves to the point where the cut score is to the left of the peak, the rate of improvements will decrease, rapidly at first, decreasing less rapidly as the bell curve shifts further right. Any conclusion that the teachers are losing their edge, the curriculum needs to be changed or some heads must roll would be wrong. All such effects seen are an artifact of cut scores’ interaction with a bell curve, and nothing more.

The above logic applies to every cut score and every demographic subpopulation. Basic, minimum, proficient and advanced will be different as will rates for different ethnicities as will rates for different schools and school districts. Without more, interpretations are guaranteed to be wrong.

There is a rule here that must be exercised. No statistic can be understood unless and until it is related back to the original data. That is, in order to make sense of any statistic, and in particular, test outcomes, it is necessary to have either the full distributional information (the original scores) or the basic distribution statistics such as counts, mean, median, variance, skewness and kurtosis for each category and subgroup that would allow each of the distributions to be simulated.

Attached is a preliminary document called “wkce-simulation” that begins to look at the wkce distributions and cut-scores in PDF and CDF formats.

Click the following link to display the graphics in PDF format. Click here

Clicking the following link will open a new window containing the dynamic graphics discussed. The graphics require the Mathematica CDF player browser plugin to be installed. If you have not done so, go to the Downloads page to install. Then return here and follow this link. The process may take some time. Years In MMSD CDF

Comments on Years in MMSD Data

That data on which this post is based was first posted on Jim Zellmer’s School Info Systems at Madison’s Transfer Students and the Achievement Gap. The writer, Andrew Statz, Executive Director of Information Services for MMSD, concluded the results were largely ambiguous as to whether years in MMSD schools had a positive impact and illustrated the data using stacked bar charts. I used his data to create different graphics that I would more easily understand. As a consequence, I found the results less ambiguous, as I explain below.

I have to admit that I cannot read stacked bar charts, and I cannot see patterns in them that might be present. Instead, I have represented the same data as a sequence of bar charts for each year students spent in MMSD schools displayed as percentages within the year.

In these histograms, I was able to see some non-ambiguities that I could not see using the stacked histograms. I make some observations using the percentage histograms, understanding, of course, that the significance of these observations might be unstable given the relatively few students who transfer into MMSD as compared to those who enter school as soon as they come of age. A further problem is the data is not longitudinal — the students represented in each year are different; we are not seeing how the set of students progress in their education as the years go by.

What We Would Like to See in these Histograms

Though the students depicted within each row of histograms are different students, at some risk, we might want to assume they represent the same sample population over time. Making this assumption, the pattern we hope to see is a distribution skewed left; that is, the histogram will show few at the Minimum, a little more at Basic, significantly higher percentage of students in Proficient, and higher still at Advanced.

Examples of the acceptable skewness are shown in the 4th Grade Reading Percent histograms for MMSD Years 3 and 4+. In both these cases, the percentages are approximately Minimum at 10%, Basic at 20%, Proficient at 30%, and Advanced at 40%.

Of course, in a more perfect world, the histogram for Minimum and Basic would together total 2%, representing that we may not be successful those students starting more than two(2) standard deviations below the mean, Proficient at 13% and Advanced at 85%, representing that we can educate students to Proficiency if they lie between one(1) and two(2) standard deviations below the mean, and we can educate to Advanced for everyone else.

In order to demonstrate students are being well-educated in MMSD schools compared to other school districts, the overall pattern we would expect to see is students coming into the district (at MMSD Year 0) with a relatively nondescript distribution, and as the years in MMSD increase, the distribution becomes increasingly skewed left, as described above.

Comment on the Histograms

Overall Impressions

The Histograms show that students at MMSD demonstrate much less progress in Math than in Reading as measured by the WKCE. This should not be surprising given the general math and science illiteracy in the US (approaching 95% by some measures) in the adult population and the TIMSS and PISA score comparisons of American students with international students over many years. But, further, members of the administration I have talked to indicate that MMSD is focused mostly on reading and putting much less emphasis on mathematics.

The second impression is that students scoring at Minimum or Basic in Math remain in these two subgroups throughout their years at MMSD. Likewise, students scoring Proficient or Advanced in Math remain in these subgroups. Further, MMSD students seem not to be progressing from Proficient to Advanced; the percentage of students at the Advanced level generally remains lower than the percentage at Proficient. Both these observations perhaps illustrate the current and damaging attitude that in Math, you either have it or you don’t.

4th Grade WKCE Scores

In Math, we see an increasingly skewed left distribution involving Basic, Proficient and Advanced. This is good, showing that these kids are moving up the scale toward competence. However, students at Minimal remain stuck; the histograms show MMSD is making little progress for kids at the Minimal competency level and this remains true regardless of how many years the students are attending MMSD schools.

In reading, MMSD’s data looks good. However, those entering MMSD for the first time (MMSD Years = 0) in 4th grade already show a reasonably skewed left distribution, implying to me that reading is being focused on relatively successfully in other districts. There is an anomaly for MMSD Years = 2. These kids entered MMSD in the 2nd grade and had only 20% scoring Advanced and 45% scoring Proficient; this is somewhat reversed for the others years. These students seem to have come from a different population.

8th Grade WKCE Scores

The 8th Grade Histogram sequence contains MMSD Years = 2 which are students entering MMSD in the 6th grade, the beginning of middle school. The histogram shows an almost Uniform distribution in both Math and Reading, with an ever so slight left skew involving Minimal, Basic and Proficient in Math. Not knowing what these former 6th graders’ scoring distribution was entering MMSD, it’s hard to determine MMSD’s contribution to these scores.

In Math, there seems to be nothing ambiguous about the trends over time spent at MMSD schools in how well such students perform on the WKCE – there seems to be no MMSD contribution. Those students in the Proficient and Advanced levels fluctuate randomly between each other over Years in MMSD, those in Minimum and Basic also fluctuate randomly between each other. What the data seems to show is that those students stuck in the lower two levels stay there, and those in the upper two levels stay there, with the variation we see perhaps little more than students at the cusps between Minimum and Basic or Proficient and Advanced scoring just a little better or a little worse.

In Reading, there is no indication of progressive improvement to scores based on the number of years in MMSD, though the reading distributions look better than the math distributions; that is, there is a faint tendency to left skewness over time.

10th Grade WKCE Scores

In Math, except for students entering MMSD in the 9th grade (MMSD years 1), the distributions show right skewness, opposite of what we want to see. We don’t see improvements until the students have been in MMSD schools 5 or more years, then the change is modest at best. Students in the Minimum and Basic groups show no consistent improvements.

In reading, we see left skewness consistently only for those in MMSD since the 3rd grade. Those students entering MMSD between the 4th grade and 8th grade don’t show progress. Those entering MMSD in the 9th grade are from a different population of students than those entering at other times.

Final Thoughts

Lack of progress in math is disturbing. Worse still, is that MMSD’s focus on reading alone, I claim, is a substantial cause. The way math in general is taught focuses on “genuine” problems. Genuine problems heavily emphasize solving problems written in English and therefore the focus on reading English is made the first goal. This is a problem. Mathematics is its own language and the focus on teaching mathematics should be to teach that language. I contend that only basic competence in English (or any language) is required to teach and understand the language of math.

This belief is bolstered by the following. In college in the 1960’s, it was expected that those majoring in the sciences or math would learn either technical French or technical German. I still have my technical German language book that allowed me to read, quite slowly, computer science research papers written in German. There was never an emphasis to learn to speak the language. A number of those in computer science studying for their PhDs at the time were learning French for the same purpose.

Finally, a number of friends and colleagues in my math and science classes were from China, Taiwan, Korea, and India. Their proficiency in English tended to be basic but they were acing these technical classes. These technical classes were taught with emphasis on the math or science, not “genuine” problems requiring English proficiency. Of course, these foreign students by necessity were the cream of their cultures, so none were your average college students.

Nonetheless, at the level of elementary, middle and high school math and science courses, the content of which is in reach by almost any student, there is little evidence to suggest that this content can only be learned by focusing on “genuine” problems using verbose academic English. When taught by instructors who themselves are proficient in math and science, brevity of word is a plus and use of the language and symbols of math and science will allow even those with limited English proficiency to succeed.