The work was supported by Taskstream, an educational technology company that has an online system for schools to manage assessment, accreditation and eportfolios. Last year Taskstream began participating in the AAC&U initiative Valid Assessment of Learning in Undergraduate Education (VALUE). Begun in 2007, VALUE provides tools to assess students' "authentic work," to understand how well they're progressing toward graduation-level achievement in areas that employers and faculty consider essential, such as their ability to do critical thinking and work with data.

In the latest experiment, 7,215 pieces of student work produced in regular courses were uploaded to a Taskstream-built Web site. The student population whose work was assessed represented approximate samplings for gender, race, age and Pell vs. non-Pell population sizes equivalent to those in the participating institutions. The "artifacts" came from students who were at least three-quarters of the way through completion of their degree requirements.

Some portion of those assignments was scored by 126 faculty members who were specially trained in three areas: written communication, critical thinking and quantitative literacy. To do their scoring, the faculty participants used rubrics developed in the VALUE program. Also, they only scored work from students at institutions other than their own, and that work may not have fallen within their own disciplines.

While the researchers emphasized that the findings weren't "generalizable" across states or the country, they did conclude that "faculty from a variety of disciplines, from dozens of colleges and universities, from nine different states across the nation could assess the work students had done and evaluate it in a consistent and reliable way." As SHEEO President George Pernsteiner noted in a prepared statement, "There was no special test. There was no time away from the classroom. There was, however, a common understanding by faculty from diverse places and backgrounds of what constituted learning and whether students had demonstrated it."

The point of the pilot was to figure out whether rubric-based assessment could be taken to scale and still "produce valid findings with credible and actionable information about student learning." If that turned out to be true, then it was possible that standardized rubrics could be used to improve the design of curriculum and assignments as well as improve on the effectiveness of programs and classes to produce the learning outcomes desired by the college.

The findings for those schools that participated were extensive:

The researchers concluded that an array of institutions can develop sampling plans that draw from samples of student work across departments and demonstrate achievement of learning outcomes that cut across disciplines;

Faculty can use common rubrics to evaluate student work — even outside of their own subjects;

With training, instructors can create reliable results using a rubric-based assessment approach (to ensure the validity of that, more than a third of the student work was double-scored specifically to examine inter-rater reliability);

Faculty found that the VALUE rubrics were "very useful" for assessing student work and improving assignments;

A Web-based platform can serve as a useful framework for collecting student work and facilitating assessment; and

A common rubric-based assessment can generate data that provides evidence of student achievement in important learning outcomes.

Regarding the specific learning outcomes, the study concluded that many students who had earned at least 75 percent of credits toward their degrees still weren't at a high level of critical thinking skills. For example, on a scale of zero to four, less than a third of student work products collected from four-year institutions were scored "3" or "4" in the area of "using evidence to investigate a point of view or reach a conclusion." Four in 10 samples received a "0" or "1" rating on how well students "analyzed the influence of context and assumptions" to draw conclusions.

In the area of written communication, almost half of the work collected from two-year institutions was scored either "3" or "4" on "content development" in writing. A third was scored "3" or "4" on demonstrating the use of "sources and evidence" in writing.

In the area of quantitative literacy, students' calculation skills received higher ratings in both two-year and four-year schools than their abilities to "make judgments and draw appropriate conclusions based on quantitative analysis of data." Less than half of work products at four-year institutions and a third at two-year colleges were rated "3" or "4" on this dimension of quantitative reasoning.