In which a veteran of cultural studies seminars in the 1990's moves into academic administration and finds himself a married suburban father of two. Foucault, plus lawn care. Comments are welcome. Comments for general readership can be posted directly after the blog entry. For private comments, I can be reached at deandad at gmail dot com. The opinions expressed here are my own (or those of commenters), and not those of my (unnamed) employer.

Monday, November 24, 2008

Ask the Administrator: Institutional Research

An occasional correspondent writes:

Let's say you get a piece of paper with a report, pie chart,etc.on it that presents some pieces of information. Maybe it's fromthe registrar, saying that enrollment in elective courses is up whilethat in core courses is down. Maybe it's from the development office,saying that we raised 6% more than we did last year. Maybe it's fromadmissions, comparing numbers of applications from the last coupleyears. It might be good, bad, or indifferent news, counterintuitiveor blindingly obvious. In any case--how do you know it's accurate?What checks are in place to verify the accuracy of information likethis. Financial statements are audited every year. What about therest of the mass of data that an institution accumulates?

Most colleges of sufficient size have something like an Office of Institutional Research. (Sometimes the office consists of just one person, but I've seen it consist of a real staff, too.) Sometimes the IR office is located in Academic Affairs, sometimes in Student Affairs, and sometimes in some other corner of the institution. (For whatever reason, I've often seen it coupled with the Foundation.)

The IR office is charged with generating data to populate various reports, both required and discretionary. The Federal government requires all kinds of data reporting, to document the use of financial aid, the direction of graduation rates, different achievement levels by race and gender, etc. The colleges don't have the option of ignoring these, at least if they want their students to be eligible for Federal financial aid. Additionally, it's not unusual for grantors to want periodic updates on issues of concern to them, and the smarter academic administrations will generate plenty of queries of their own, the better to enable data-based decision-making. (As opposed to, I guess, faith-based.)

I've had some strange experiences in dealing with IR offices. As a fan of data-based decisions, I usually get the frequent-customer discount with the IR folk. In the course of earning that discount, though, I've learned anew that data are only as good as the queries behind them.

Take a simple question, like “what's the college's retention rate?” Fall-to-Spring, or Fall-to-Fall? First-time, full-time students (the federally mandated data), or all students? Matriculated students only? What about students who transferred out after a year and are now pursuing four-year degrees? (We have a significant number of those, and they count as 'attrition' for us and 'grads' for the four-year schools. It's a persistent and annoying bit of data bias.) What about students who never intended to stay? Students who withdrew last Fall, stayed away last Spring, and returned this Fall? And at what point in the semester do we count them as attending? (We've usually used the tenth day, though any given moment is obviously imperfect.)

Graduation rates are even tougher, since we don't track students once they've left. Based on feedback from some of the four-year schools around us, we know that a significant number of students who leave us early get degrees from them, but it's hard to get solid data.

Moving from institution-level data to program-level is that much worse. If a student switches majors and later graduates, should that show up as attrition for the first program? Is it a sign of an institutional failure, or is it simply something that students do? Again, defining the variables is half the battle.

In terms of information that doesn't come from the IR office, accuracy can be trickier. That's because the whole purpose of the IR office is to provide data; when other offices do it, they're doing it for a reason. Some data are relatively easy to verify, so I'd tend to believe them: number of admitted students, say, or number of donors to the foundation. Others are tougher. For example, something as seemingly-straightforward as “percentage of courses taught by adjuncts” can be calculated in any number of ways. If a full-time professor teaches an extra course as an overload, and gets adjunct pay for it, does that course count as 'full-time' or 'adjunct'? Do non-credit courses count? Do remedial courses count, since they don't carry graduation credit? What about summer courses? Do you count course sections, credit hours, or student seat time? Do you count numbers of adjuncts, or the courses taught by them? (This is not a trivial distinction. Say you have two full-time faculty teaching five courses each, and four adjuncts teaching two courses each. It's true to say that you have a 5:4 ratio of full-time to adjunct courses: it's equally true to say you have a 2:1 ratio of adjunct to full-time faculty. Generally, the statistic chosen will reflect the desired point.)

Annoyingly, college ERP systems tend to be clunky enough that even well-intended people can generate terrible data, simply based on errors in how students or programs get coded in the system. I've lived through enough ERP-generated nightmares to wince at the very mention of the acronym.

Verifiability is tough to answer across the board. Data can be false, or they can be accurate-but-misleading, or they can be ill-defined, or they can be artifacts of system errors. My rule of thumb is that the worst errors can usually be sniffed out by cross-referencing. If a given data point is a wild outlier from everything else you've seen, there's probably a reason. It's not a perfect indicator, but it has served me tolerably well.

Wise and worldly readers – any thoughts on this one?

Have a question? Ask the Administrator at deandad (at) gmail (dot) com.

The timing of this question/post is uncanny. I am the midst of applying for my college's Director of IR position.

I subscribe to two axioms: "Numbers don't lie," and "There are three types of lies: Lies, damn lies, and statistics."

I have to largely agree with DD's assertion that the answer you get depends on the question asked and who's doing the asking. For example, a measure of the of a college's "enrollment" (not a department's, necessarily) can be reported two ways: contact hours and headcount. I tell the administration contact hour numbers because they understand the concept; the media and lay-public get headcount. Both reports can paint an accurate picture even if they have an inverse relationship.

As far as the reliability of the numbers/outcomes, by all means ask questions. You needn't mine the data yourself but I should be able to explain where the figures come from without double-talking you in order to obscure the equation.

You're right, data can be suspect even when it comes from the IR office. Sometimes it boils down to the skills of the staff, sometimes it's about their integrity.

I was a research associate in an IR office for five years, responsible for completing most state and federal reports, the university dashboard, and the main US News survey, among other things.

Every entity we reported to had their own way of defining how they wanted the data presented -- which meant I would run the data sometimes three or four different ways, and sometimes get three or four different numbers.

My supervisor was very strong on transparency and consistency in presenting data, both within and outside the university -- but we were aware of other universities who reported data that was questionable.

For example, US News is very clear about who should be included in the freshman cohort for retention and graduation purposes; if any directions were unclear, we called them and asked for clarification.

We were aware of universities who did not include their adult alternative colleges in their data (even though they met US News criteria) because the retention rates weren't as good as their 'traditional' colleges. But even though we would take a hit in the US News results, we were determined to present data that was as honest as possible.

For presenting data within the university, we chose to use consistent numbers annually by using what we presented to US News or IPEDS, with footnotes on charts describing the instructions/ descriptions of how the data was calculated.

Over the five years I was there, the office moved from being completely disrespected university-wide because of our predecessor, to a serious level of respect from every quarter in the university (and among outside entities to whom we reported) for providing consistent, accurate data.

That particular supervisor taught me a lot, and I'm forever grateful for that learning experience.

What bugs me the most about the kinds of numbers that come from this source is that I've never seen statistics done on them. Enrollment was 2450 last year, and 2475 this year, it went up! Um, maybe, but maybe not.

We have just started using NSLC's student tracker function to find out what happens either to our grads or our students who stop out. Like all sources of data it is imperfect, but it has been really helpful in finding out if students are coming for a semester and then vanishing or if they are using us as a stepping stone.

I direct a graduate program at an R1 university. Our problems are the same as yours. The data is there, but a) it is only as good as what was entered; b) spread around too many distributed databases; and c) even the really good "query writers" (of which there are few) often don't know about all the fields they can query.

As one amazing example, my university's centralized databases can tell me how many people applied to a given graduate program, and how many ultimately enrolled. How about how many accepted the offer? Nope -- that info somehow is kept by individual departments.

I know Dean Dad is a CC person, but some of you may have heard about the NRC "every 10 year study" of graduate programs nationwide. In a perfect world, all that data would sit in a relational database, and just writing the queries would be tricky. In practice, it consumed hundreds of person hours mixing and matching reports and visually inspecting for signs of data entry errors. Similar challenges exist when compiling data for training grants, for example.

The good news is that these experiences started a much needed conversation about data warehousing and consolidation - but still just a conversation.