Sunday, June 22, 2008

This sounds like a simple question. It sounds as if it is asking for us to discover a set of factors that influence the level of performance of individuals within a population when they get to colleges and universities. And we might speculate that there is a small group of potentially relevant factors: antecedent cognitive ability, attitudes, and values; location within a set of social relations that enhance or impede successful educational performance; quality of educational resources provided in K-12. We might reason that a given individual's performance is affected by his/her ability and motivation; enhancing or inhibiting circumstances; quality of educational "treatment"; and chance events or circumstances (a lucky break, an inspiring grandfather). And by examining antecedent conditions and outcomes across a large population of people, we might expect to be able to assess the degree to which various hypothesized factors in fact lead to differences in the performance of sub-populations defined by these factors. This analysis should shed light on the question, "What factors cause differences in university success?".

Sorting this out sounds like a straightforward empirical question. Consider this hypothetical study. First, identify a cohort of high school seniors -- let's say, all the seniors in 2000 in metropolitan Boston. Suppose this is 5,000 people. (1) Measure a set of features of their situation during high school: high school performance, family situation, features of the school attended, socioeconomic status, family status, racial-ethnic status. (2) Measure a set of psychological characteristics for each individual: motivation, determination, aptitude for mathematics, ... And, (3), measure college success five years following high school graduation (GPA, credit hours completed, degree attained).

Now follow these individuals for 10 years: What further education do they pursue? Do they complete post-secondary education? What is their performance in post-secondary education? What occupations and jobs do they get? What income do they achieve by age 30? How much unemployment have they experienced?

Finally, we will do some basic statistics on this data set: compute the incomes and schooling for various sub-categories; test for correlations between outcomes and antecedent conditions; etc. Are there differences in outcomes when we cross-tabulate by ABILITY or MOTIVATION? What about if we cross-tabulate by RACE or SES? This analysis may produce statements like these hypothetical findings:

People who completed high school with high performance were 2.5 times as likely to complete a college degree as those with a low performance.

People whose family income was in the top quintile were 5 times as likely to complete a college degree as those from families in the bottom quintile.

The college completion rate for white students, Hispanic students, and African-American students were X, Y, and Z respectively.

High school graduates from high schools with peer counseling programs were X percent more likely to complete a bachelor's degree.

People living in single-parent households during high school had completion rates of X compared to Y for dual-parent households.

A study along these lines provides a first indication of how some of these social characteristics may be related to performance in college. If a factor is not causally related to the outcome, then the population possessing this factor should have the same performances as the population lacking this factor (the null hypothesis). So if we find that differences in family structure or performance in high school are associated with differences in college performance, then we can infer that these factors play some causal or structural role in the outcome.

However, these findings do not establish specific causal linkages among the factors. Take the hypothetical finding about family income: is this statistical discovery the result of this mechanism (greater family income provides more support for tutoring and academic support) or this mechanism (greater family income is associated with familial values that put strong emphasis on successful completion of university degree) or this mechanism (greater family income confers social advantages that make completion easier for affluent students)? In other words, the statistical discovery does not determine the nature of the causal relation between the antecedent condition and the outcome; it simply points the researcher towards investigating the concrete social mechanisms that might be at work here.

The example demonstrates an important lesson about social inquiry. Statistical study of a population can in fact point us towards some preliminary hypotheses about social causation. But these statistical discoveries are only the first step. In order to confidently assert causal relationships between things like income and race, to educational outcomes, we need to arrive at a nuanced analysis of the social relations and institutions through which these gross factors play into individual outcomes. We need to have an account of the mechanisms and processes through which the effects of concrete social settings characterized by differences in family structure, SES, race, or schools play out in the social psychology and educational opportunities that determine the ultimate outcomes of the young people who pass through them.

(A similar line of thought can be found in this posting on the problem of sorting out the data establishing correlations between race and asthma.)