In a previous post, I noted that one of the studies often cited for the success of academic vocabulary instruction, Snow, Lawrence, and White (2009), is in fact an example of its questionable effectiveness. Here’s another such study, similarly cited as an example that academic vocabulary teaching “works.”

Townsend and Collins (2009) looked at the effects of a 30-hour academic vocabulary intervention on a small group (N = 52) of English language learners (ELLs) in middle school (age range: 11 to 15).

The intervention consisted of a five-week, after-school program that focused on “daily direct instruction and discussion of three or four target words . . . using large cards with the words, definitions, sentences containing the words, and sentences with the target words missing, as well as supporting pictures” (p. 1000). Other activities and games were also used.

Interestly for a study of this sort, “high-interest novels were read aloud to the students . . . and students were able to choose two of the novels to keep and continue reading on their own” (p. 1000). This introduces an obvious confound, since we know that free reading itself contributes to vocabulary acquisition.

Target word knowledge was measured on the researcher-created “Measure of Academic Vocabulary” (MAV), a modified form of the Vocabulary Knowledge Scale by Paribakht and Wesche (1993). Knowledge of each word tested was scored on a scale of 0 to 5. The MAV included 10 words randomly selected from the 60 words taught (“target words”), plus 10 additional academic words that were not taught (“non-target words”). Two different versions of the MAV (Form A and Form B) were created.

The second set of 10 “non-taught” or non-target words on the MAV was meant to control for what kids may have acquired in their regular classes or their outside reading. This is an especially important control since the intervention was an after-school program, and therefore done in addition to the students’ normal English and content classes.

The researchers created two groups, both of which were given the pretest in December (MAV – Form A). Then Group 1* got the five-week intervention in January and February, and both groups were tested again (first post-test, MAV – Form B). Group 2 got the intervention in March and April, followed by both groups taking a second post-test (MAV – Form A) in late April.

Nagy and Townsend (2012), in summarizing the results of the study, state that the effect size was medium (partial eta squared = .15) on the MAV in favor of the treatment group, and that “gains were maintained in delayed post-testing” (p. 99). Both of these findings are open to question, however.

First, the effect size reported by Nagy and Townsend for the MAV was for the entire test, which as noted measured both target word knowledge and other academic words not taught. All we can reasonably conclude from the total MAV score is that kids got better in academic vocabulary, as most students appear to do without any special intervention, such as the control group in Snow, Lawrence, and White (2009).

Second, the MAV test itself turned out to be rather seriously flawed. After an initial reporting very positive results for the treatment groups, Townsend and Collins inform us that they discovered after the fact that Form A and Form B of the academic vocabulary measure differed significantly from each other in the relatively difficulty of the target words and the non-target words.

Form A’s questions on the target words were significantly easier than Form B’s questions, and Form B’s questions on the non-target words were easier than Form A’s questions. The difference in difficulty was not trivial: on the pretest (Form A), the target word score for Group 1 was 17.75 (SD: 9.09) versus 9.55 (SD: 4.78) for the non-target words (d = 1.15).

Form B was used as the immediate post-test for Group 1’s treatment period, and the immediate pretest for Group 2’s treatment. If Form B differed significantly from Form A, we obviously cannot reliably know how the two groups performed in the treatment and control conditions. This would include the MAV test results that Nagy and Townsend (2012) reported on, as well as the delayed post-test results, since we don’t have reliable immediate post-test results.

In an attempt to salvage the situation, Townsend and Collins reported on the results only from the testing that used Form A: the pretest in December and the second post-test in late April. They reported the “within-group” results to determine how much growth students made on the target words versus the non-target words.

I summarize those results in Table 1, converting the raw scores into words known (out of 60 target words and 60 non-target words).** I include also the effect sizes Townsend and Collins report for each set of gains.

Table 1: Pretest and Second Posttest Results on the MAV

December

April

Gain

Effect Size

Group 1

Target Words

21.30

28.20

6.90

0.42

Non-Target Words

11.96

12.78

0.82

0.18

6.08

Group 2

Target Words

20.40

28.87

8.47

0.71

Non-Target Words

11.22

15.38

4.16

0.50

4.31

From Townsend & Collins (2009), Table 3, p. 1004

Both Group 1 and Group 2 made progress on both target words and non-target words, even though the non-target word measure was much more difficult. The researchers also reported large effect sizes for within-group gains (target versus non-target), but these don’t mean much given that the two tests differed so substantially in difficulty.

All we can really say is that kids got better on academic vocabulary from December to April, and that the gains on the easier target word measure were greater than on the non-target test.

Optimistically, we might compute the treatment effect as being the difference between the target word gain and non-target word gain, which was somewhere between four and six words after thirty hours of instruction, or about one new word every five hours.

No reading comprehension tests were administered during the study, and the comparisons on another test of academic and general vocabulary, the Vocabulary Levels Test, saw no significant improvement from December to April. The VLT was modified to include some of the target words, but Townsend and Collins do not report separately on those results.

The study is hardly a ringing endorsement of academic vocabulary instruction, and should instead be seen as very meager results for a great deal of effort. Like Snow and colleagues, Townsend and Collins could argue that the kids got other things from the intervention, such as exposure to more academic texts and academic language. But at minimum, a vocabulary intervention should be able to deliver meaningfully higher vocabulary scores.

(*) Townsend and Collins called their two groups “A” and “B,” but I’m using “Group 1” and “Group 2” because the researchers also refer to their testforms as “A” and “B,” and this study is confusing enough already.

(**) Each word tested on the MAV was scored from 0 to 5 points, so I divided the raw score by five. I then multiplied that result by six, since the 10 words on the test were a random sample of 60 total words. Thus we get an estimate of the total number of words known for the target and non-target words.