Dear Education Data Geeks: Stop Obsessing Over Subgroup Analyses

One of the most recognized features of No Child Left Behind is subgroup analysis, which shifted attention from the average performance of a school to the performance of specific populations within a school, such as ethnic/racial groups, free and reduced lunch, English language learners, and those with learning challenges. This was a laudable attempt to focus performance conversations on groups that often fell through the cracks of our school systems.

Subgroup analysis continues to dominate conversations about learning data. Ask school leaders how they analyze their data, and often they describe how they compare assessments results of different demographic groups.

But from the perspective of learning analytics, I offer one small suggestion: it’s time to retire subgroup analysis.

I choose the word “retirement” because it honors the important role subgroup analysis played in building a data culture in schools. Subgroup analysis was helpful because it moved the performance conversation beyond overall averages that mask important detail. However, its impact has been limited because of three realities.

1. Subgroup Analysis Invites Discussion of Meaningless Differences

This problem has been best illustrated in medical research. In his discussion of subgroup analysis, epidemiologist Ben Goldacre highlights an article published by a group of researchers in the medical journal Circulation.

In the study, the researchers selected over 1000 patients with coronary disease from a database and randomly assigned them to one of two treatment groups. After creating two randomized groups, the researchers didn’t offer the patients any new treatment, but did collect follow-up data on their progress to see what would happen. In their analysis, they found that the two groups did not differ significantly in their survival rates (as you would expect), but subgroup analysis revealed that a certain subgroup of coronary disease performed significantly better in the first treatment group than the second. Normally, this would be a key finding—were it not for the fact that there was no difference in treatment. The subgroup differences were the result of random chance.

Was this a fluke finding? Not really. This phenomenon is closely connected to the idea of “insensitivity to sample size.” Mathematically, we know that the smaller sample, the greater the variation in the data. This means that a small collection of data is much more likely to give an extreme averageeither much higher or much lower than the rest of our data.

We see this kind of analysis in schools every day. Analyzing data in education is often an exercise wherein we look for any differences in test scores that interest us, but too often those differences are the result of random chance rather than a meaningful pattern in the data. State test scores may drop for an entire school, but we are quick to point out that scores are up for a specific subgroup (e.g. 7th grade girls). Probability theory tells us that it is much more likely that the scores of 7th grade girls would be higher or lower than the school average due the group’s small sample size. We compound this problem when we indiscriminately look at many different subgroups because doing so compounds the chances we will identify false differences in the data.

2. Subgroup Analysis is Difficult to Translate Into Action

Because this type of analysis can easily lead to false conclusions, it can be difficult to trust performance differences we see in subgroups. That alone should suggest retirement for subgroup analysis, but putting this (big) problem aside for a moment, let’s assume a scenario where somehow we know the differences we see in subgroup analysis represent a legitimate difference in learning outcomes. What would we do?

Our instinct is to use root cause analysis to determine why a group of students performed differently than the general population. It may be that our school system has a bias or shortcoming in its approach that makes it less effective for a particular group of students. If so, we would need to systematically examine how we offer educational services to determine if hidden biases or gaps in needs exist—not an easy thing for a school leader or teacher to do.

A more direct explanation of subgroup performance differences is that there may be starting performance differences by subgroup. Since the Coleman report (1966), we know there is a strong connection between student background (poverty) and student achievement scores. We may find that many students in a subgroup enter school with academic deficit, a likely scenario for historically-disadvantaged students. Once detected, we could provide academic intervention to those students in the subgroup to close the learning gap. Differentiating instruction based on student needs is one of the best uses of learning data; however, rather than targeting only students in a subgroup for additional help, we should be concerned with the learning gaps of all students below grade-level (regardless of expectation).

3. Subgroup Analysis Steals Time and Attention From Better Forms of Data Analytics

Time is precious, as educators face many competing priorities. The time they invest in data analysis must meaningfully inform educational decision making. Time spent performing and discussing subgroup analysis would be better spent analyzing data in richer, deeper ways that support learning more directly. Educators should refocus their efforts away from subgroup analysis to forms of data analysis focused on outcomes for individual students and how their needs vary from grade-level instruction in either content (what you teach) or form of instruction (small group learning; individualized learning). There are many possibilities, some of which I describe elsewhere.

The problem with taking action on subgroup analysis is that it is still aggregated data (though a targeted form of aggregation). Aggregated data are difficult to act on because we educate individual students, not the average of students. Subgroup analysis, while it dives more deeply into data, stops well short of offering actionable analysis created to help specific students. Ultimately, subgroup analysis puts educators in one of two unenviable positions: find significant-looking differences that are due to chance, or find differences that may be real but difficult to act on.

Subgroup analysis is popular because it is easy to perform and explain, has a historical connection to school accountability, is familiar to most educators, and is included in many reporting suites in commercial data systems. It’s time, however, to rethink how we approach data analysis and focus our efforts on techniques better positioned to meet the needs of all students.

Nick Sheltrown is Vice President of Analytics and Accountability at National Heritage Academies

Dear Education Data Geeks: Stop Obsessing Over Subgroup Analyses

One of the most recognized features of No Child Left Behind is subgroup analysis, which shifted attention from the average performance of a school to the performance of specific populations within a school, such as ethnic/racial groups, free and reduced lunch, English language learners, and those with learning challenges. This was a laudable attempt to focus performance conversations on groups that often fell through the cracks of our school systems.

Subgroup analysis continues to dominate conversations about learning data. Ask school leaders how they analyze their data, and often they describe how they compare assessments results of different demographic groups.

But from the perspective of learning analytics, I offer one small suggestion: it’s time to retire subgroup analysis.

I choose the word “retirement” because it honors the important role subgroup analysis played in building a data culture in schools. Subgroup analysis was helpful because it moved the performance conversation beyond overall averages that mask important detail. However, its impact has been limited because of three realities.

1. Subgroup Analysis Invites Discussion of Meaningless Differences

This problem has been best illustrated in medical research. In his discussion of subgroup analysis, epidemiologist Ben Goldacre highlights an article published by a group of researchers in the medical journal Circulation.

In the study, the researchers selected over 1000 patients with coronary disease from a database and randomly assigned them to one of two treatment groups. After creating two randomized groups, the researchers didn’t offer the patients any new treatment, but did collect follow-up data on their progress to see what would happen. In their analysis, they found that the two groups did not differ significantly in their survival rates (as you would expect), but subgroup analysis revealed that a certain subgroup of coronary disease performed significantly better in the first treatment group than the second. Normally, this would be a key finding—were it not for the fact that there was no difference in treatment. The subgroup differences were the result of random chance.

Was this a fluke finding? Not really. This phenomenon is closely connected to the idea of “insensitivity to sample size.” Mathematically, we know that the smaller sample, the greater the variation in the data. This means that a small collection of data is much more likely to give an extreme averageeither much higher or much lower than the rest of our data.

We see this kind of analysis in schools every day. Analyzing data in education is often an exercise wherein we look for any differences in test scores that interest us, but too often those differences are the result of random chance rather than a meaningful pattern in the data. State test scores may drop for an entire school, but we are quick to point out that scores are up for a specific subgroup (e.g. 7th grade girls). Probability theory tells us that it is much more likely that the scores of 7th grade girls would be higher or lower than the school average due the group’s small sample size. We compound this problem when we indiscriminately look at many different subgroups because doing so compounds the chances we will identify false differences in the data.

2. Subgroup Analysis is Difficult to Translate Into Action

Because this type of analysis can easily lead to false conclusions, it can be difficult to trust performance differences we see in subgroups. That alone should suggest retirement for subgroup analysis, but putting this (big) problem aside for a moment, let’s assume a scenario where somehow we know the differences we see in subgroup analysis represent a legitimate difference in learning outcomes. What would we do?

Our instinct is to use root cause analysis to determine why a group of students performed differently than the general population. It may be that our school system has a bias or shortcoming in its approach that makes it less effective for a particular group of students. If so, we would need to systematically examine how we offer educational services to determine if hidden biases or gaps in needs exist—not an easy thing for a school leader or teacher to do.

A more direct explanation of subgroup performance differences is that there may be starting performance differences by subgroup. Since the Coleman report (1966), we know there is a strong connection between student background (poverty) and student achievement scores. We may find that many students in a subgroup enter school with academic deficit, a likely scenario for historically-disadvantaged students. Once detected, we could provide academic intervention to those students in the subgroup to close the learning gap. Differentiating instruction based on student needs is one of the best uses of learning data; however, rather than targeting only students in a subgroup for additional help, we should be concerned with the learning gaps of all students below grade-level (regardless of expectation).

3. Subgroup Analysis Steals Time and Attention From Better Forms of Data Analytics

Time is precious, as educators face many competing priorities. The time they invest in data analysis must meaningfully inform educational decision making. Time spent performing and discussing subgroup analysis would be better spent analyzing data in richer, deeper ways that support learning more directly. Educators should refocus their efforts away from subgroup analysis to forms of data analysis focused on outcomes for individual students and how their needs vary from grade-level instruction in either content (what you teach) or form of instruction (small group learning; individualized learning). There are many possibilities, some of which I describe elsewhere.

The problem with taking action on subgroup analysis is that it is still aggregated data (though a targeted form of aggregation). Aggregated data are difficult to act on because we educate individual students, not the average of students. Subgroup analysis, while it dives more deeply into data, stops well short of offering actionable analysis created to help specific students. Ultimately, subgroup analysis puts educators in one of two unenviable positions: find significant-looking differences that are due to chance, or find differences that may be real but difficult to act on.

Subgroup analysis is popular because it is easy to perform and explain, has a historical connection to school accountability, is familiar to most educators, and is included in many reporting suites in commercial data systems. It’s time, however, to rethink how we approach data analysis and focus our efforts on techniques better positioned to meet the needs of all students.

Nick Sheltrown is Vice President of Analytics and Accountability at National Heritage Academies