In my last post, I presented evidence that when high school students read books for pleasure related to their school subjects, their test scores in those subjects go up. Rolls and Rodgers (2017) have just published a wonderful study that extends this argument in a slightly different direction: reading certain kinds of fiction – specifically, science fiction and fantasy – can help you acquire a good chunk of common scientific vocabulary.

Rolls and Rodgers looked at three large corpora: academic (e.g. scientific journal articles), general fiction (novels, stories), and “science fiction/fantasy,” the latter being composed of around 20 million words taken from novels, magazines, literary journals, young adult fiction, and movie scripts. Next they took a list of 318 words (technically, word families) that commonly appear in the “hard” sciences, such as biology, physics, chemistry, computer science, and so forth, complied by Coxhead & Hirsh (2007) (PDF), and examined how frequently those word families occurred in the science fiction/fantasy texts.*

They discovered that nearly all (294) these common scientific terms (such as degrade, module, uptake) would be met at least once by reading 1,000,000 words of sci-fi and fantasy texts. More importantly, they found that the majority of these 294 word families would occur six or more times, and that 41.5% (132) would be met 10 or more times, giving you a very good chance at acquiring them.

Rolls and Rodgers also analyzed their data in a more sophisticated way in terms of the frequency with which words appeared in their corpus. Previous attempts in the L2 literature to estimate how many new words you can acquire by reading a certain amount of text have used what we can call an “all-or-nothing approach.” Nation’s (2014) estimates of incidental vocabulary acquisition, for example, were based on the assumption that any word that occurred at least 12 times in his corpus would be counted as “acquired.” Words that occurred fewer than 12 times would not count at all. Other researchers have set the bar lower – Cobb (2007) and McQuillan & Krashen (2008) used six repetitions, but with this same logic.

All-or-nothing estimates are convenient to calculate, but they fail to include any partial word acquisition or acquisition below the set cut-point. We know from a variety of studies that the correlation between word acquisition and frequency is strong and positive: all other things being equal, the more times you see a word in text, the more likely you are to acquire it. (All-or-nothing estimates make this same assumption, at least implicitly.) Even when a word occurs only once in the text, there is a (small) chance the reader will acquire its meaning (or some portion of its meaning).

What Rolls and Rodgers did was take into account potential acquisition at all frequency levels. They took the acquisition “pick-up” rates from three previous studies (Waring & Takaki, 2003; Pigada & Schmitt, 2006; Pellicer-Sánchez & Schmitt, 2010) and mapped those rates onto their own data. For example, in one of the studies they used, Pellicer-Sánchez and Schmitt, words that appearred from 5-8 times in their text were acquired at a rate of 45%. Rolls and Rodgers assumed a similar acquisition rate for the words in their corpus that occurred that many times. They in effect created three “profiles” of acquisition and estimated the potential word gains based upon them.

Table 1 summarizes the results then of their analysis, showing the number of scientific terms you could acquire with 1,000,000 words of reading (taken from their Table 4, p. 51). Note that it would take the average ninth grader less than six months to read this amount, reading 30 minutes a day at 200 words per minute (roughly what Carver (1989) estimated is the average ninth-grader’s reading speed).

Meaning recognition is typically measured with some form of multiple-choice test; meaning recall requires you to provide a definition, synonym, or translation of a word. Recall is clearly more difficult than recognition, and is the most conservative measure of word knowledge. The recognition estimates Roll and Rodgers calculated ranged from 89 to 127 words, while recall estimates were between 30 and 121. The average across all the estimates was 83 words.

Depending on the method for counting acquired words we use, then, we can conclude that reading a million words of science fiction and fantasy will give you a good chance of acquiring somewhere between a quarter to a third of all the common scientific words on Coxhead and Hirsh’s list. That’s a pretty good return on a modest investment of time, especially one that is enjoyable.

+++++++++

Three additional (somewhat geeky) notes on this study:

(1) Some researchers might object that when technical vocabulary is used in fiction texts, it doesn’t always have the same meaning or use that it has in an academic text (Gardner, 2004). Another possible objection is that fiction does not expose students to the same “discourse” patterns used in an academic register, even if the vocabulary overlaps (Nagy & Townsend, 2012). Both of these potential complaints miss the larger point, however. Even incomplete acquisition of technical (or sub-technical) vocabulary will likely give you a considerable leg up when you later encounter those same words in academic writing. Fiction is a bridge to the destination, not the destination itself, as Krashen (2003) pointed out.

(2) There is good evidence to think that nearly all incidental vocabulary acquisition rates reported in the research so far are too conservative when it comes to free reading. That’s because the texts used in these studies are typically chosen by the researcher, not the reader. When we select our own texts, we normally have (a) some background knowledge on the topic we’re reading and/or (b) interest in the topic. Both of these influence comprehensibility and engagement, and both lead to higher vocabulary pick-up rates. More on this in a future post. The estimates Rolls and Rodgers provided are therefore also probably too low for what a reader might gain when reading science fiction for pleasure.

(3) Now some VERY geeky stuff: I’ve been playing around the past few months with a more generalizable approach to counting partial word acquisition. Instead of profiling from previous studies as Rolls and Rodgers did, we can instead start with the probability of acquiring a word that occurs only once in the text. Estimates from both first-language and second-language studies put that probability between .05 and .15, with most estimates falling on the lower end of the spectrum. We then multiply that probability by the number of repetitions, assigning each potentially unknown word a value.

For example, if the single-occurrence probability of acquisition is .05, a word that occurs 10 times is assigned the value of 0.5 (.05 * 10). Obviously words that occur 20 or more times are considered “acquired” (.05 * 20 = 1.0), similar to the cut-points used by Nation and Cobb and others. We then add up all the values of the unknown words to get an estimate of words acquired. My method assumes (incorrectly) a perfectly linear relationship between frequency and acquisition, but if it aligns with other methods of estimations, it may still be useful.

I did a very crude calculation based on Rolls and Rodgers’s data (from their Table 4), using the mid-point of the ranges they used (i.e. for the 1-5 range, I used 3; 6-9 range: 7.5, and so on) for a .05 and .10 probability. Here are my results, along with the ones Rolls and Rodgers obtained (again for 1,000,000 words read):

Rolls & Rodgers’s Profile Method: 83 words (range: 30 – 127)

.05 Probability x Repetition Method: 89 words

All-or-Nothing Method (10 repetitions): 132 words

.10 Probability x Repetition Method: 150 words

Using a Probability X Repetition approach gives us estimates within the range of both the profile and cut-point methods: .05 is nearly the same as applying the profiles from various studies to the data, while .10 is in the ballpark of the 10-repetition cut-point approach.

The above admittedly limited comparison suggests that simply multiplying the number of repetitions by the single-occurrance probability will yield results similar to other estimates.

*We refer to words that appear in one discipline or a closely related set of disciplines as technical vocabulary. Note that this is not the same as the academic vocabulary used in studies such as those we discussed previously (here and here), which focused on a set of words (sometimes called sub-technical vocabulary) that occur frequently in almost all disciplines.