Rogue Data

Like most colleges, mine was born before IT became a fact of life. IT had to be grafted onto a pre-existing culture, or, more accurately, set of micro-cultures. Different departments and support programs have their own ways of doing things; some have welcomed technology, some have grudgingly adapted, and some have shoved it over in a corner, hoping it would eventually go away.

In concrete terms, that means that we have one master system for data, based on our “live” ERP system -- that’s the system that handles student registrations and scheduling, among other things -- and a whole set of other mini-systems housed in various departments, usually running rogue Excel spreadsheets.

That’s not necessarily a bad thing, of course. Individual departments or programs have specific needs, and if they’re simply using a custom recipe to mix the same ingredients as everybody else, I have no objection. Yes, some are savvier about data analysis than others, but almost any high-level skill is unevenly distributed. I don’t see that as a crisis.

The problem is that they draw data at different times, and define terms differently. So individual programs are receiving different facts. This leads to plenty of low-level conflict.

(My favorite was a few years ago, when an advocate for a particular intervention came to me with her own customized data on the success rates of students who tried it. She had a small sample, but the percentages were impressive. When I looked more closely at the numbers, I saw a gap, and asked about it. She replied that she didn’t count the student who dropped out to follow his girlfriend, on the grounds that it had nothing to do with her program. I nearly fell off my chair.)

Now we’re looking at corralling the various databases into a single, unified set of queries drawing from the same data at the same time. Which is simple enough conceptually, but it involves getting the folks who’ve only grudgingly made peace with Excel to start wrestling with the campuswide ERP system in a serious way. This is no small thing. And it involves having each separate program give up some control over its own data.

In a perfect world, that wouldn’t matter; if anything, it could be seen as offloading some work. But experience tells me that some folks like to use data to tell the story they want told. That’s often based on good intentions, and sometimes on local knowledge that is easily lost in an aggregation. But experience tells me it’s also based on a sense of control, of filtering which numbers get out, and of not wanting to change how things are done.

That’s not necessarily an entirely bad thing, of course. I cringe when I hear people who should know better use idiotic, if technically “correct,” statistics to indict community colleges. (For example, graduation rates that count early transfers as dropouts drive me around the bend.) But I don’t think the answer to that is to hide from statistics or to cherry-pick idiosyncratic measures. The battle to have is over definitions and relevance, and that battle should be had openly. If we’re using the wrong measures to evaluate a program, that’s probably a sign of not really understanding the program; an open discussion of the measures could lead, if indirectly, to a better understanding of the program.

Wise and worldly readers, have you been through a process of corralling rogue databases? Is there anything in particular we should know?