From assessment cop to assessment advocate: On the prevention and treatment of data rage

As more institutions sink more money into myriad student success programs, the appetite for data documenting the impacts of those programs grows and grows. Assessment’s true believers – you have probably met a few of them, if you aren’t one yourself – rhapsodize about the importance of rigorously examining whether our teaching and other practices really work as well as we think they do, followed by using that information for ongoing improvement.

Quantitative evidence is also what big-ticket stakeholders increasingly want to see. Deans, provosts, presidents all sit up and take notice when these numbers make their appearance, and often it’s those intricately structured charts and graphs, backed by lots of p values less than that .05 cutoff for significance, that get heads nodding and checkbooks opening. In a higher education budget environment that is, shall we say, increasingly resource-limited, programs with access to this kind of quantitative analysis have a major survival advantage.

All this turns into to an awfully tempting project for social scientists like myself, who live for turning slippery, seemingly intangible processes (like learning) into concrete conclusions. Here’s how it looks to us: We, the resident experts, will use our specialized skills to set up rigorous comparisons, apply the right statistics, and then proudly present our colleagues with solid conclusions as to whether their ideas are working or not. We’ll be heroes. Right?

Maybe, but equally likely is that your hero’s journey will come to an untimely end. Perhaps your meticulously documented findings will end up gathering dust on an administrator’s bookshelf while programs perk merrily along exactly as they did before. Or maybe people will pay attention to your findings all right, but instead of thanks and thoughtful reflection, you’ll get hurt feelings and angry pushback.

How did your treasured skills end up wasted – or openly resented? Social scientists might be inclined to see it as knee-jerk resistance to seeing reality revealed through data, or an unwillingness to accept challenges to cherished assumptions about how learning works. But there is more to it than that.

Partly, it’s a result of fundamental differences in how social scientists and academics from other disciplines view inquiry and knowledge. There’s a quote that I remember from my undergraduate introduction to research methods books that sums up our stance: If you can’t measure something, it probably doesn’t exist. The word “probably” notwithstanding, what this means is that social scientists, for all intents and purposes, don’t believe in intangibles. Nothing can be measured perfectly, but we assume that existing things leave some trail of evidence that we can pick up on, however indirectly. Beauty, love, fleeting thoughts and feelings – these are all things that can’t perhaps be directly apprehended through the senses but can be gauged somehow.

That’s our philosophy, and it has served us well by allowing us to plow into a lot of questions traditionally thought to be unanswerable, or answerable only in some completely subjective way. But to many teachers, intangibles don’t just exist, they’re the only outcomes that are truly worth pursuing. Throw the two mindsets together in an assessment project, and you can have an ugly collision indeed.

Other culture clashes arise from the fact that social science – like any science – is never over and done. We love the eternal pursuit of the questions raised by earlier questions, and take joy in refining and revisiting those research problems. Colleagues who want to get answers, preferably by the end of the fiscal year, do not always share this joyful spirit of endless inquiry. In real universities where people’s programs, and sometimes jobs, are on the line, hearing that all that data simply suggested lots of interesting hypotheses for future research can make your colleagues really mad.

Assessment is one of those crossroads in higher education where many tensions meet. This recent Chronicle article The Tyranny of Metrics cites the futility and exorbitant cost of keeping up an ongoing “data arms race,” coupled with the idea that the culture of measurement is leading us away from, not towards, the goals that we ought to have as institutions of higher learning. Reading this latest dissection of the assessment problem, I was reminded of an older Chronicle piece titled Who’s Assessing the Assessor’s Assessors? This one describes the process as one of absurd, infinite regress from grades as indicators of learning, to outcomes assessment, to assessment of those assessments of the outcomes. Tellingly, one of my colleagues had a yellowing copy of this article taped to his office door for the better part of a year, and pretty much every time I tried to bring up assessment of our shared program, he’d point to it.

Even I’m not immune from data rage; as the director of large student success program (Northern Arizona University’s First Year Learning Initiative) my projects are under the assessment microscope regularly, and I sometimes feel myself getting sucked down into that vortex of “but what do we REALLY know?” For example, one thing the initiative is supposed to do is redirect instructors from straight-lecture pedagogy as the default way of using class time. Given that a lot of our participating courses are the kind of large, lower-division classes where lecture tends to take over, it’s a fairly big deal that we get instructors agreeing to attempt more active learning.

This outcome is assessed, of course; besides turning in a syllabus of practice (a kind of class handbook) detailing the common pedagogies for the course, faculty complete a self-report survey detailing their planned practices before and after the intervention. We’ve also conducted surveys of their students (over a thousand of them now) to see if their perceptions echo all those good intentions. But these measures aren’t perfect, and thus naturally the question has been put to me – what’s really going on in those classes?

Fair enough, I thought, so on the suggestion of some of my sharp STEM colleagues, I set out to pilot test an adapted version of this intriguing protocol for gathering quantitative descriptions of what goes on over the course of a class meeting. This system is something observers can get trained up to use in just a few hours, and it generates the kind of multicolored pie charts and other eye candy that I know will look wonderful in our annual report. And having a few dozen of these gathered from more classes in the program will certainly help me sleep at night, secure in the knowledge that we’re producing great impacts with the dollars invested.

But I also surely know that after the pie charts are served, one of assessment’s true believers will point out that these data are but a mere slice of all that is really going on in all the class time for all the courses in the whole program. What about the class meetings we didn’t observe? The instructors that opted not to participate? Can we really know what went on in every moment spent in every course, by every student and every semester, across time and space, world without end? Confronted by these questions, the social scientist in me will want to join the chorus in singing our discipline’s favorite refrain: more research is needed!

But the program director in me will say: please, no more. This initiative, like most, isn’t a controlled experiment, or anything remotely resembling it. It’s a real-world, messy project, with many facets, ins and outs and what-have-you’s. Assessing it in some way is important, but it’s inappropriate to try to force it to fit the mold of hypothesis testing à la your typical scholarly article in social science. A more reasonable goal is to aim for getting a reading on whether the project is headed in the right direction or wrong direction. This is woefully weak to a classically trained social scientist’s eye, but less likely to end up in the land of more-research-is-needed.

Beyond simply managing expectations, I think we need a new framework for how to use the tools of social science to improve practice in real educational environments, what I’ll call advocacy assessment. Practitioners of advocacy assessment deliberately step out of that role where they are judging whether a program works or not. Instead, they begin with the assumption that programs designed by reasonably well informed educators, with good intentions, are probably having some kind of positive effects. The exact nature and degree of those effects are what we want to establish through empirical evidence, but we’re not there to ferret out what doesn’t work.

The role of an advocacy assessor comes down to two things: First, we are there to document the good things the program is doing. As people skilled in using numbers to tell a story, we can help program leaders tell the story of their projects, connecting intentions and interventions to what happened as a result. Second, we’ll make recommendations about how to augment those impacts – for example, by identifying which parts of the program seem to be paying off the most. You’d be surprised how much more enthusiastic programs are about “closing the assessment loop” when we frame findings as opportunities for productive next steps, instead of problems to be corrected.

Another key component of advocacy assessment is collaboration, especially during the planning and interpretation stages. Guiding program leads to figure out for themselves what their most important impacts are and how to measure them can be as useful, if not more useful, than simply confronting them after the fact with data you came up with.

For example, I once worked with the directors of an action research program in which students worked on projects ranging from community gardens to public art to immigration reform. I asked the leaders of the different projects to design their own assessments, laying out at the start what they thought were the most important indicators, and setting up plans for gathering the data themselves. It was harder, and more time-consuming, compared to having me prescribe and execute an assessment plan. But it pushed teams to take ownership of the numbers while building the capacity to be effective advocates for their own programs.

Perhaps most importantly, advocacy assessment means losing the social scientist’s hyper-skeptical attitude. This is tough, because we social science types live for being able to debunk something that everyone assumes is effective. Being the resident debunker is tied closely to our disciplinary identity, and from a practical standpoint some skepticism is helpful. But skepticism can be too much of a good thing when it comes across as suspicion that our fellow educators are deluded or outright dishonest if those statistics don’t all come up p< .05.

In fact, excess veneration of either-or statistical tests is something that we’re now questioning within the social sciences. Psychologist Geoff Cumming is leading a “new statistics” movement that calls for us to stop pretending that we are really pulling for the null hypothesis (i.e., no effect), then grudgingly accepting that we’ve found something only when we pass the magical .05 mark. In place of this peculiar ritual, he suggests that we engage in a more open-ended examination of patterns in our data, looking for the degree of overlap between treatment groups to gauge not just whether there was some effect but also its magnitude. In this way, we can still carry out a rigorous analysis that takes into account things like random variation, without throwing away important information that we can get even from “nonsignificant” findings.

While we social scientists will probably never accept the idea of intangibles, we do need to accept that many of our measures are primitive at best. Education is one of the most complex transformations people can go through, so it’s no wonder that we can’t always neatly pin down all of the impacts an educational intervention might be having.

Similarly, rolling in big changes to programs such as multi-section, multi-instructor courses cannot usually be done in a way that lends itself to controlled comparisons. When properly planned, formal redesign projects are supposed to have some mechanism for making comparisons across sections (i.e., the pilot and traditionally taught ones) or against historical data on student performance. But oftentimes, the historical data aren’t there, or it’s unreasonable to leave some students in the traditional system while others get the new approach. Or even when the design is perfect, the changes take place against a backdrop of other factors – changes in prerequisite courses, shifting student demographics, new instructors – that wash out whatever you were hoping to pinpoint.

The idea of data-driven course design and pedagogy is still fairly new, and over time, we educators will probably figure out better ways to walk a path between social science perfectionism and an anything-goes true believer mentality. But whatever this data-driven future looks like, changing our role from assessment cop to assessment advocate is going to have to be part of the plan.