Cut Scores for Smarter Balanced and PARCC: Trying to Be Like NAEP

February 20, 2015

Both the Smarter Balanced Assessment Consortium (SBAC) and the Partnership for Assessment of Readiness for College and Careers (PARCC) are looking to the National Assessment of Educational Progress (NAEP) to determine their cut scores– the scores that separate individual student scores into artificially-contrived achievement groupings.

It is important to remember that achievement categories are not natural groupings. Someone must create them and assign importance to them.

Both SBAC and PARCC once advertised their wares as “next generation assessments” that were supposed to be marvels surpassing the well-known “bubble” tests– of which NAEP is one.

However, it seems that the realities of time and money constraints have both SBAC and PARCC as indeed being “bubble” tests, and not only that– they are “bubble” tests with cut scores modeled after NAEP determinations of “proficiency.”

Of course, NAEP is also not “aligned” with the Common Core State Standards (CCSS). However, if certain percentages of student scores are “expected,” for example, to “happen” to fit NAEP’s (overdue for reconsideration) distribution of “fewer than four in ten” students scoring “proficient” in 2013, then mastery of CCSS takes a back seat to an expected scoring distribution outcome.

All this for in-consortium, state-to-state “comparisons” financed by a White House administration on its way out in 2016. A Race to the Exit.

SBAC and PARCC involvement is much more political than it is educational. There is no other reason for NAEP high scorer Massachusetts to officially join PARCC other than the fact that Massachusetts Commissioner of Education Mitchell Chester chairs the PARCC governing board. PARCC is struggling, and frankly, it looks bad for the PARCC governing chair to not have his state “all in.”

That’s not how it works. In-house diagnostics are more streamlined and timely than a consortium-based, annual assessment with delayed scoring.

But back to SBAC, PARCC, and NAEP.

Each state can decide the meaning of individual student scoring outcomes on PARCC and SBAC. Again, if states model their scoring after NAEP percentage outcomes (or the percentage outcomes of any previously administered standardized test), the issue becomes less about CCSS “mastery” and more about ranking— where a student scores relative to other test takers.

The issue also becomes one of state officials’ cut score decisions as determining other outcomes– including a state’s percentage of “failing” teachers and “failing” schools.

State cut score setting on high-stakes tests is not an “objective” process. It is highly subjective, and one in which the person setting the cut score wields incredible power to punish.

When I was an undergraduate at LSU, I had one course in which the professor told us on the first day that our class’ final grades would fit the normal distribution. The course was Evolutionary Psychology. The professor said there would be a few A’s and a few F’s. Based on his determination, it did not matter who dropped the course (i.e., if those most likely to score F’s dropped) or how many of those present on the first day completed the course. It also did not matter the degree to which individuals mastered the material– if a student wanted an A, that student had to rank higher than enough others in the room. As one might expect, this made for a tense, anxious classroom atmosphere, one in which students were reluctant to help one another lest the person one helped might outscore the helper. I did not enjoy this “every man for himself” class, and I was aware that other students were concerned about my grades. (Students I did not know openly verbalized as much.)

I have often thought of that Evolutionary Psychology class when considering the current atmosphere of punitive, test-score-driven, so-called “reform.”

In quite a different experience, my doctoral advisor had a policy that he roughly adhered to a ten-point scale to determine letter grades but that he would look at the scores of all students to see where the top scores “broke” (i.e., where there was a notable gap in scores) in order to place the cut score between A’s and B’s. This policy often translated into an eleven- or twelve-point “A” category in which the professor considered both degree of material mastery and scoring distribution characteristics.

The latter professor’s behavior encouraged both mastery of material and a cooperative learning atmosphere.

What America has with CCSS and its consortium tests, SBAC and PARCC, does not equal what the second professor above afforded his students. CCSS and its appendaged SBAC and PARCC do not encourage mastery of material and cooperative learning. No. These education-business-feeding creations encourage skills-based teaching and abundant test drill– neither of which have any influence on those cut scores. A teacher can spend every class period drilling students to take tests, but his/her efforts do not influence the setting of those power-wielding cut scores. The bureaucratic thinking that “raising” cut scores equates “forcing” better instruction from teachers and higher levels of learning from students is unsubstantiated nonsense.

I think it’s time for America to model our standardized testing after Finland.

Click the link above to read Valerie Strauss’ reporting on the test administration schedule, question format, grading (first done by teachers of the students themselves), and sample questions.

(Here are a few more informative readings on education in Finland, and here is some info comparing professional salaries within Finland, though it is from 2007.)

A closing thought from Strauss’ post:

If your path brings you to learn in Finland, be prepared to engage in deep discussions about politics, religion, poverty, spiders, junk food, young people questioning authority and other topics absent from the tests you took in high school, regardless how you feel about these topics. College readiness is to be ready to deal with all aspects of the world we live in, not just those that resonate well with your own. [Emphasis added.]

Take a lesson, SBAC and PARCC.

(Thought it would be interesting to see salary based on hours per teaching time. American teachers work twice as long, including “bureaucratic responsibilities” that do not exist in Finland.)

None of this would have happened if the educrats had the least amount of respect for the intellect of public school teachers. They think we’re stupid and will just roll over and accept without questions what we’re told. Instead teachers like you, Mercedes, hold them and their schemes up to public ridicule and derision. Well done! Thanks!

Don’t forget that the real culprits are our elected officials who allow these initiatives to become the law of the land, both federal and state. It’s time we hold them accountable for their complicity. Most of them are clueless about education. Yet they pass laws that are harmful to our children. It’s time we “educate” them!

This is what I know about the difference and similarities between NAEP and PARCC and SBAC which I learned from DIane Ravitch’s book “Reign of Error”. She was on the governing board for seven years…. “Congress authorized the National Assessment of Educational Progress to administer tests to national and regional samples of students. The first assessment was administered in 1969. In 1992, NAEP began reporting the scores of states that volunteered to be assessed. Most states participated, but not all. Participation in NAEP testing did not become mandatory for all states until the passage of No Child Left Behind.”

“After No Child Left Behind, the federal government assumed a command-and-control role that was never envisioned in 1979.”

The “main NAEP” has tested samples of students in the states and in the nation every other year since 1992 or 1990, depending on the subject. The questions have gotten more challenging over time.

There is another version of NAEP that the federal government has administered since the early 1970s. This “long-term trend assessment” tests students who are ages 9, 13 and 17. This test has used a large number of questions that have been used consistently for more than 40 years. The content seldom changes except removing obsolete terms. The long-term trend NAEP is administered to scientific samples of students every four years.

“NAEP is central to any discussion of whether American students and the public schools they attend are doing well or badly. It is administered to a sample of students; no one knows who will take it, no one can prepare to take it, no one takes the whole test. There are no stakes attached to NAEP; no student ever gets a test score. NAEP reports the results of its assessments in two different ways.” – Diane Ravitch

Scale scores: range from 0 to 500. What students know and can do.
Achievement scores: Advanced, Proficient, Basic, Below Basic. These are judgements “set by external panels that determine what students should know and be able to do. The NAEP governing board authorized the establishment of achievement levels in the early 1990s. At the time, people complained that the process was rushed and the standards were flawed. No Child Left Behind created an expectation that all students ought to be Proficient. “ – Diane Ravtich

Education reformers have pointed to the scores and misinterpreted them claiming that a large percentage of students are scoring below grade level in subjects. They make these false claims by using “Proficient” as equivalent to “At Grade Level” which is incorrect.

This is another unusually long post.
I worked on the first and second NAEP tests in the visual arts, vintage 1970s. The subject has no deep tradition of testing and serious issues in thinking through the role of talent and education in achievement. The most useful part of two rounds of NAEP tests in the visual arts was the background information, including the dismal record of access to instruction in and beyond school.

That said, I have also looked at the background on the CCSS and federal funding for PARCC and SBAC tests. USDE had a test-funding plan for the CCSS in April 2010, before the CCSS were published in June. By September grants were awarded to the PARCC and SBAC groups to begin test development $300 million total.

Then out of the blue, the two consortiums applied for supplementary funds for curriculum work needed to do the test development. That was a huge snafu in USDE’s thinking from the get go—just do standards and tests, nothing needed in between. In seeking more money, the grant applicants made these statements:

PARCC will “coordinate with the Smarter Balanced Assessment Consortium on… artificial intelligence scoring, setting achievement levels, and anchoring high school assessments in the knowledge and skills students need to be prepared for postsecondary education and careers” (PARCC, 2010, December, p. 3).

Sounds to me like a master plan for reporting one national score tweaked from data on two tests of the CCSS –Form A test from PARCC, Form B from SBAC. Of course their references to “achievement levels” meant they would seek comparable cut scores. .

Here are some real problems not addressed.
1. The pending reauthorization of ESEA may make the tests and cut scores entirely optional or just obsolete.

2. Major changes in ESEA were introduced for RttT in 2010, including the financing of these tests, and also a change so the phrase “college-and career-ready” replaced the word “proficiency” in earlier versions of ESEA. So with or without the PARCC and SBAC tests, states are dealing with the idea of college-and career-ready as the new term of art for high stakes. There are other changes in the federal definitions for college-and career-ready. These may settle the issue of cut scores. I doubt if NAEP cut scores will pass muster as a guide.
3. “College- and career-ready (or readiness)” means, with respect to a student, that the student is prepared for success, without remediation, in credit-bearing entry-level courses in an Institution of Higher Education (IHE) (as defined in section 101(a) of the Higher Education Act), as demonstrated by an assessment score that meets or exceeds the achievement standard for the final high school summative assessment in mathematics or English language arts. (Note the OR on the high school test, one or the other, but not necessarily both).
4. Here is a brief version of the federal definition of an “Institution of Higher Education” (For a full definition, refer to Section 101 and 102 of the Higher Education Act). An Institution of Higher Education is a school that:
—Awards a bachelor’s degree or not less than a 2 year program that provides credit towards a degree or,
—Provides not less than 1 year of training towards gainful employment or,
—Is a vocational program that provides training for gainful employment and has been in existence for at least two years. And meets all three of the following criteria:
—-Admits as regular students only persons with a high school diploma or equivalent; or
—-Admits as regular students persons who are beyond the age of compulsory school attendance
—-Is public, private, or non-profit; accredited or pre accredited; and is authorized to operate in the state.
5. One more definition and note the last line. : “Achievement standard” means the level of student achievement on summative assessments that indicates that (a) for the final high school summative assessments in mathematics or English language arts, a student is college- and career-ready…; or (b) for summative assessments in mathematics or English language arts at a grade level other than the final high school summative assessments, a student is on track to being college- and career-ready. An achievement standard must be determined using empirical evidence over time.
Source https://www2.ed.gov/legislation/FedRegister/announcements/2010-2/040910e.html

I think that this means the feds have no idea what they are a doing with tests.

This seems to say that USDE will be looking at “longitudinal data” to make inferences about cut scores. The data of most relevance will be: (a) either a math test score or an ELA test score; (b) test scores from PARCC or SBAC; and (c) records of students who have taken at least one of these these and been admitted into a post-secondary program without the need for remedial courses (presumably remedial course work in eLA or math.)

In 2011 NAEP commissioned a study intended to find out what college tests and cut scores determined entry-level students’ need for remedial/developmental courses in reading and mathematics based on nationally representative data. The study sought evidence of the validity for statements in NAEP reports about 12th grade students’ academic preparedness for college.”

For other reasons, the reasoning for these tests and their uses is deeply flawed.
PARCC (following USDE specifications for lots of formative assessments) came up with a total of nine rests a year for grade 8, five in ELA, four in math. In math, four quarterly scores were supposed to be reduced to a summary score. In ELA the same protocol, with an extra performance measure required but not counted. The same boilerplate was set forth for all grades. (I made a spreadsheet to discern these ambitions). So the piloted tests now available are but a shadow of what might have been, and what USDE wanted.

More reasons for burying the whole of the Obama/Duncan/Gates agenda and questioning the purposes of the PARCC and SBAC tests.

I love your blog. I organize my ed research on Pinterest, but can not pin your blog. Please consider making your blog Pinterest friendly by adding an image that’s “pinable.” Thank you for all your dedication.