Education Week staff writer Evie Blad explores some of the nonacademic issues that bear on students’ learning. Join her for insights, news, and analysis on a wide range of issues including school climate, student engagement, children’s well-being, and student behavior and discipline.

Researchers: Measures of Traits Like 'Grit' Should Not Be Used for Accountability

The debate over how to refer to so-called "noncognitive" student traits like self-control, grit, and gratitude is crowding out a more important conversation about how those traits should be measured and how to responsibly use the resulting data, two pioneering researchers in the field say.

Because there are potential flaws in every existing measurement tool used to track the effectiveness of a growing number of efforts to build non-cognitive traits—known by names like social-emotional learning, character development, and 21st-century skills—such evaluations should not be used for accountability measures like school-to-school comparisons or teacher evaluations, the essay says.

"We share this more expansive view of student competence and well-being, but we also believe that enthusiasm for these factors should be tempered with appreciation for the many limitations of currently available measures," it says.

When tracking interventions designed to boost desirable nonacademic traits and skills in students, "perfectly unbiased, unfakeable, and error-free measures are an ideal, not a reality," the essay concludes. "Instead, researchers and practitioners have at their disposal an array of measures that have distinct advantages and limitations."

There's considerable debate over what to call these traits, qualities, and skills. In their essay, Duckworth and Yeager use the term "personal qualities" as shorthand for the rather wordy "positive personal qualities other than cognitive ability that lead to student success."

But with the drive to change educational practices comes a desire to measure the effects of that change and the almost inevitable drive for accountability, and that's where things get problematic, Duckworth and Yeager say.

Methods of Measurement

The essay uses the example of measuring self-control to explore the advantages and weaknesses of three possible forms of measurement—self-report questionnaires, teacher-report questionnaires, and performance tasks.

Questionnaires of both teachers and students are perhaps the most popular existing method for measuring growth of these personal qualities in students. In recent years, districts around the country have begun tracking student responses to questions about school climate, peer behavior, and personal attitudes, often cross-referencing the results with achievement data to look for trends.

Such questionnaires have advantages: They are inexpensive and relatively easy to develop and administer.

But they also have limitations that aren't always acknowledged, Duckworth and Yeager write. The researchers say common problems like the following may lead to inaccurate results:

Students or teachers may be limited in their ability to answer questions about growth in internal traits, such as motivation.

Surveys may fail to detect smaller incremental changes.

Reference bias—the comparative examples respondents use to gauge personal growth or success in some areas—may lead to different results from similar respondents.

"Social desirability bias" may lead students to give answers they think teachers want to see to certain questions rather than answering honestly.

Teachers may make assumptions about students when evaluating their behavior. For example, if they've determined that Sally is a "good kid," then they might also determine that she has high levels of self-control, even if she hasn't demonstrated it.

Performance tasks, another way of measuring personal traits, answer some of those concerns by using consistent, carefully crafted experiments to measure students' responses and changes in those responses over time. In the case of self-control, a common performance task would be infamous "marshmallow test," in which students are allowed to choose between eating a small pile of marshmallows now or an even bigger pile of marshmallows if they wait for a period fo time. The theory is that the students who delay gratification have more self-control.

But while the tasks themselves may seem like objective measures, the conclusions researchers draw from those tasks are often subjective, Duckworth and Yeager write. For example: "Is a child who refrains from playing with toys when instructed to do so exerting autonomous self-control, or does such behavior represent compliance with adult authority?"

Some other limitations the authors found with performance tasks include the following:

Created tasks may not reflect everyday life. In the case of the marshmallow test, a student seeking to resist temptation at home may cover up a bag of marshmallows with a cloth or walk away for a while to avoid eating them, which are reasonable self-control strategies, the essay says.

Other competencies, like hand-eye coordination, may affect a student's performance on an unrelated task.

The results of certain tasks may be a less accurate over time as students grow accustomed to them, making it difficult to measure and track growth of certain traits.

Students may make random errors, like circling incorrect answers, when completing tasks.

So What?

Duckworth and Yeager aren't suggesting that researchers and educators abandon efforts to measure students' social, emotional, and character-based traits. Rather, they should exercise caution should be used in selecting a form of measurement and recognizing its limitations, the researchers write.

Mostly importantly, because of those limitations, special caution should be exercised in determining how the resulting data is used, the essay says. The problems posed by reference bias in questionnaires, for example, mean that their results should not be used to compare schools or classrooms for accountability purposes, the authors write.

"Current data and theory suggest schools that promote personal qualities most ably—and raise the standards by which students and teachers at that school make comparative judgments—may show the lowest scores and be punished, whereas schools that are least effective may receive the highest scores and be rewarded for ineffectiveness," the essay says.

In other words, students in schools that do a good job emphasizing these skills may hold themselves to a higher standard, giving themselves lower scores on self-questionnaires.

Also problematic is the practice of using measurements of noncognitive traits to diagnose or assess the needs of individual students, it says. That's because existing measures aren't nuanced enough to measure progress on an individual level.

"Without highly reliable, multimethod, multiinformant measurement batteries whose validity has been demonstrated for diagnosis, it will be difficult for a practitioner to justify the individual diagnosis of children's personal qualities, such as self-control, grit, or growth mind-set," the essay says.

The authors also urge refinement of measurement methods so that educators can gather and track more reliable data on programs designed to build these traits, regardless of what schools choose to call them.

"Given the advantages, limitations, and medium-term potential of such measures, our hope is that the broader educational community proceeds forward with both alacrity and caution, and with equal parts optimism and humility," Duckworth and Yeager wrote.

Tags:

Notice: We recently upgraded our comments. (Learn more here.) If you are logged in as a subscriber or registered user and already have a Display Name on edweek.org, you can post comments. If you do not already have a Display Name, please create one here.

Ground Rules for Posting
We encourage lively debate, but please be respectful of others. Profanity and personal attacks are prohibited. By commenting, you are agreeing to abide by our user agreement.
All comments are public.