Saturday, September 22, 2007

A Vulture of Evidence

Last week Bill Richardson wrote an opinion piece on No Child Left Behind --- scrap it. His view sounds right to me. My kids go to reasonably good schools in Champaign which suffer from none of the problems that Kozol has identified about inner city schools. But even for them, we’ve seen how NCLB has narrowed the curriculum and pushed the teach-to-the-test approach, something my wife and I are not happy about. Yet Richardson’s pronouncement got me agonizing for another reason. The question is whether the bad outcomes associated with NCLB are a consequence of an approach to education that emphasizes measuring student performance or if instead that main issues with NCLB are a consequence of racial prejudice and that an evidentiary approach to student learning can work and work well if there aren’t other issues that dominate and handicap it. I was reminded of discussions I had in graduate school about whether communism was inherently a bad system or if, instead, it was the Soviet Union that was bad but communism might flourish under other circumstances. It’s hard to cling to the theoretical possibility that such a system might work well when there is such a prominent case in point indicating otherwise.

So I began to fret about what NCLB should be telling us in Higher Ed and mostly independent of NCLB and Richardson’s pronouncement, I’ve been trying to find my own comfort zone thinking about accreditation. The College of Business where I work is going through its year of self-study for AACSB accreditation and at the same time my Campus is getting ready for its NCA review. And so I’ve been wondering about whether these efforts will create their own pernicious consequence and so should be resisted. (Or, perhaps, we should comply with the letter but tacitly resist the spirit because overt resistance is too costly). Within the learning technology field, I’ve heard nobody express this concern. There seems to be a lot of buy in to the general idea of an evidentiary approach. So in this post I’m going to try to poke holes. My goal is to show my source of discomfort so to encourage others to ask similar questions. Some of this stuff, I believe, needs to be debated not just taken for granted.

Here I’m further reminded of a book I read while in graduate school, The Foundation of Statistics by Leonard Savage, where in the first chapter the author asserts that sometimes it is the foundations of a discipline which are the most controversial, even for disciplines that are flourishing. The foundational question for those who do advocate for an evidentiary approach is this: what exactly are we measuring? And for those who might want to want to follow Kozol’s lead but focus on the situation in Higher Ed, the question might be: what are the unintended consequences of the measurement efforts we do engage in and how harmful are those consequences? Those are my core questions in my effort when I try to poke holes.

Let me begin with testing and my own personal memories of that, some of which are still strong although I took my last class for credit in 1979 and defended my dissertation in 1981. There are many things I’ve learned subsequently and have since forgotten. But some of these memories of testing remain strong. Why is that? And is it good or bad.

Let me start with the SAT. I took it twice. The first time, I had a broken right arm which was in a cast not quite up to the elbow. I was nervous that it would affect my ability to fill in the bubble sheets. After I got my scores I joked that it only adversely affected my score on the Verbal, not the Math. At the time, there was a lot of emphasis on the word “aptitude” in the SAT name and that the test measured aptitude. I got a 790 on the Math. I assume that means one question wrong, but I don’t really know that. The next memory is blurred between that test, the next time I took the SAT, and GRE, but on at least one of those exams there was a question on Roman Numerals. To this day, I don’t remember which is the Roman Numeral for 500 and which for 1000. I’m ok with Roman Numerals up to 100 but not beyond that. So I likely garbled the M and the D, and got that question wrong. But there is no way that was an indicator of aptitude. After all, you can look these things up. I’ll get back to that in a bit.

My scores that first time were 620 V and 790 M --- definite math nerd. I took it a second time, hoping that the Verbal would go up and got 650 V and 770 M. In the West Wing, my favorite TV series the last few years, the President, who is supposed to be a genius, got 1590 total the first time and took it again! The SAT, in particular, seems to play the same role for high testing students that tattoos play for society at large. Does it leave such a mark on students who don’t test so well? There is now a big industry devoted to getting tattoos removed. Are those who advocate for the evidentiary approach aware of the marks they may be leaving behind on the students from whom the evidence is collected?

Next let me talk about my last semester as an undergrad taking classes at Cornell. Having taken just one intro to macro course in economics, I was advised to take the advanced graduate course in math econ – I had the math credential but no other economics. The course started out with 7 students in total, 4 grads and 3 undergrads. The grads all dropped so for the final there was just the three of us undergrads. We had a take home final and it was brutal. I got a 65 on it (two problems right, partial credit on a third, and the fourth wrong) and with that I was put in my place. Because much of it was over my head I had made some major conceptual errors – trying to prove that a divergent series converges, that sort of thing. I latter learned I was the high scorer among the three of us. One of the other students, who has since become quite a prominent economic theorist, but who was a year or two behind me at the time, got a 20 on the test. I know that during the class I had the feeling that I could do the math (in spite of the serious mistake on the final) but I had no sense at all about why any of the theorems were interesting. Absent that intuition it was hard to be passionate about learning the method of attack for the proofs. So perhaps I didn’t work hard enough for the course. But what else might the exam measure? I know that in many cases we talk of students being able to apply what they learn in a course to other problems they might confront outside the course. Just what other problems would I confront on this stuff? (In grad school I didn’t see this content at all, but when I came to Illinois I taught Undergrad Math Econ and used David Gale’s book, so did ultimately return to some of this stuff and the problem I did confront was how to teach it.)

I took several exams in grad school that I recall but I want to focus on one in particular. Northwestern is on the quarter system and in the first quarter there I took probability from Ted Groves, a tough and rigorous course. The second quarter I took statistics from Ehud Kalai. He is a world class game theorist but his heart wasn’t into teaching statistics. Also, my class got kind of burnt out by the Paper Chase approach to instruction we experienced the first quarter. So the course was less intense than the one Groves taught. Kalai asked us if we wanted a midterm. You can guess the answer. And the in-class final was open book. During that final I recall doing a problem where I read the book to learn the relevant statistics, not as a reference to what I already knew but rather to learn it the first time through; then I did the problem. I got 100 on that exam. And I finished in a little more than an hour, definitely the first one among my classmates to leave the room. Nobody else got 100. But I didn’t know much about statistics. I probably could have scored nearly as well if I were given that test before the course was offered because what I did know was how to solve math problems. I’m an ace at that. Clearly solving math problems is a correlated skill, but it is not the same as knowing statistics. In much else we teach we really want students to develop learning to learn skills so they can solve problems in situ. Knowing how to do math problems is a learning to learn skill on how to do exams with math. Exams with math might want to test that to some degree. But they want to test knowledge of the subject where the math is applied as well. It’s very hard to parse the one from the other, at least with an open book test.

The best type of exams are oral. There is back and forth and the follow up questioning becomes situated in the previous responses. In this manner the examiner can learn quite a lot about the student’s knowledge, how robust it is and how well the student can think through related issues based on what the student is presumed to know. In this sense every real conversation has an element of testing in it. Perhaps that is an observation to suppress. Most students who go through an oral exam, and here I’m thinking of doctoral studies, are extraordinarily nervous beforehand and during and may be quite self-conscious when giving their responses. That type of fear can be an inhibitor and restrict the conversation; to go deeper into the issues the participants in the conversation should be relaxed. In the process the participants learn quite a lot about what each other is thinking. We can surmise that Aristotle certainly understood the competencies of the young Alexander.

Apart from testing and conversation, the other evidence we have of student learning is through the works they create – their writings and increasingly their multimedia creations – and through in class presentations they give. As the latter are typically not recorded (and many courses don’t entail in class student presentation) there has been more focus on the former. And because we now live in a digital world, the works can readily be archived for later review, scrutiny, and reflection. This is the basis for the current fascination with ePortfolios. And at this level, that is for the good.

Originally, I was quite high on ePortfolios as a concept because I thought it was associated with longitudinal assessment, measuring growth and hence measuring learning, looking for increments rather than for snapshots of performance as most testing does. Every eager parent anticipates the weighing and measurement of height of their infants to assure the child is healthy and growing properly. And most parents I know (admittedly a small sample) continue with the pencil marks on the basement wall to track the growth of their kids after those visits to the pediatrician have ended. Measuring child growth in this way is a labor of love. In primary school some of my kids’ teachers kept an archive (paper based) of the kids’ work and used that to show the kids’ progress when we had parent-teacher conferences. Primary school teaching, at least in the school my boys attended, had a labor of love aspect to it for which I’m quite grateful. That changed with middle school as the kids rotate through the subjects with different teachers and more of the work they do is regular homework.

Longitudinal growth becomes harder to measure because the rate at which students grow intellectually slows, because the teachers become less invested in the individual students, and because the bulk of what we do measure is within course learning. At the college level, ePortfolios are not about longitudinal growth. Instead, they are about “measuring competencies” via “genuine assessment.” In other words, ePortfolios are about setting a bar and seeing whether the student has cleared it. I’ve written a prior critique about measurement of this sort, about the interrelationship between where the bar is set and what to do about the imprecision in measurement, particularly as it pertains to the performance of students in the gray zone near the bar. I don’t see those advocating for the evidentiary approach talking about this issue much if at all. They should. Here, I’m going to push on.

There are some disciplines where a portfolio approach has a long tradition, e.g. design and writing studies, and instructors in those fields may be drawn by instinct to ePortfolios and welcome the technology for this purpose. But in many other fields an ePortfolio approach is alien, with traditional testing much more common. Faculty in such fields show little interest in ePortfolios and in the evidentiary approach more broadly. The push for this is coming from elsewhere.

It is coming from the Department of Education, the Spellings Commission Report on Higher Education, the various state and local governments who have become increasingly stingy about funding public higher education (this link requires access to the Chronicle), and from the accrediting agencies, who are feeling the heat from these other sources. As one of my colleagues in the Finance department here put it, we’re driven by a yuppie culture, consumerist to the extreme, where the key to any financial transaction from the buyer view is understanding what you’re paying for. With college tuition so high and still rising at a hyperinflationary rate, the consumerism demands an answer. That is the core driver.

The traditional approach has been a trust model, where at each point in the hierarchy there is some activity that indirectly affects quality assurance of teaching and learning. For example, academic departments put in huge effort on recruiting faculty. Recruitment is a big deal. Promotion and tenure, another big deal, gets reviewed at multiple levels --- department, college, and campus. Those and salary review are the main mechanisms for quality assurance. They are all indirect. As a faculty member, I have an ethical responsibility to teach my courses where the content is correct for the course listing and where the teaching approach is suitable for students at that level. Usually, there is not direct monitoring of that, but only the indirect results from course evaluation, which feeds into the other indirect mechanisms, and the write ups by the instructor to document their teaching at the various junctures where they are evaluated along their career path.

The trust model was working reasonably well when I joined the faculty at Illinois in 1980. I’m not saying it was perfect, but it was reasonably effective. And the confidence in public higher education was high. The confidence is waning, as evidenced by the documentary Declining by Degrees. The question is whether the decline in confidence is purely a matter of perception – the quality is largely the same as it was in 1980 but in an industry clearly subject to Baumol’s Cost Disease, the increased tuition has created this change in perception in and of itself, or if there are cracks forming in the trust model and those cracks are largely the cause of the change in perception. Truthfully, I don’t know the answer to that question and I’m not sure how to produce an answer, though I think it is a good question to ask. But returning to the start of my post, my fear about the evidentiary approach that the Bill Richardson’s editorial triggered is that it will lead to complete breakdown of the trust model and overall worse results, particularly at public higher education institutions such as Illinois, creating a view of Higher Education akin to the view many now have of urban public schools. Let me explain how that might happen.

The best teaching is highly innovative, one experiment after another with the approach under constant modification. The learning of the instructor about how to teach most effectively drives the learning of the students; it serves both as inspiration and model. My friend Barbara Ganley is a great exemplar of this excellence in instruction. Innovation of this type leads to idiosyncrasy in the teaching, an idiosyncrasy we should welcome. This type of innovation can thrive under the trust model. That is not to say that the trust model drives the innovation but rather that creative instructors like Barbara can feel free to experiment.

The evidentiary approach is different. It demands a roll up or aggregation of the evidence. Goals in the course syllabus must somehow align with goals articulated by the department, which in turn must align with goals articulated by the college and so on up the hierarchy. The need for evidence that can be so aggregated might very well act as an inhibitor on instructor innovation, encouraging a more cookbook approach. (Much of Kozol’s criticism of NCLB is indeed that it has engendered a militaristic approach to instruction pushed from above that seemingly hinders the play and creativity of the students.) This would be the unintended but highly pernicious consequence of a total embrace of the evidentiary approach. We’re not there yet, but it seems to be a real possibility worth considering.

Of course it is possible to envision an alternative outcome with the evidentiary approach, one more benign. The goals that get articulated could be sufficiently broad as to not inhibit the good teaching at all (and not inhibit the bad teaching either), indeed that accreditation may have only a minimal impact at a place like Illinois but is nonetheless sufficient to alert the public about disreputable Diploma Mills.

Whether those represent two endpoints of a spectrum I’m not sure. It seems so to me but so many other of my colleagues are advocating for the evidentiary approach that I’m willing to admit I may be missing something. But mostly I fell there is a lot of fuzzy thinking out there about measuring learning and as a consequence not nearly enough attention put on the how the gathering of evidence might affect the teaching and learning practice as well as how it might affect the teachers and learners themselves. Further, there is the related issue that if the measurement is done poorly then should we be doing it at all.

At Illinois we already know what the core learning issues are. These relate to large class size, particularly during the general education part of the curriculum and the concomitant problem that students can become anonymous and consequently disengaged in their own learning. What we need to do is promote conversation, between students and faculty and between students and other students, particularly during the general education phase. We do that with our Living and Learning Communities. It is an excellent approach but it does not scale. We need some other alternative that does for the students who reside elsewhere. But it’s not the families of these other Freshman who are really concerned about the value they are getting. The public that is most upset with us are the families of students who applied to Illinois but were not admitted, especially in the case where the parent did attend yet had a lower ACT score and GPA than the child.

Given that, accreditation is more of a tax on the institution (or college in the case of disciplinary accreditation like AACSB) rather than an opportunity to engage in needed self-reflection. As long as it is perceived as such, like most tax payers we’ll engage in mild tax avoidance and otherwise comply.

But will that approach satisfy the consumerist demand for information? And if not, will that force an outcome like what NCLB has produced? I don’t know, but I think we should talk about it.