Is Synthetic Phonics Instruction Working in England? (Updated)

Since originally posting this analysis back in September, 2017, I have shortened it for publication in Margaret M. Clark’s new edited volume, Teaching Initial Literacy: Policies, Evidence, and Ideology (2018). But here I’m posting a somewhat longer version than the one included in Clark’s book, with updates to my original post.

Is Synthetic Phonics Working in England? A Comment on the “Teaching to Teach” Literacy Report

Jeff McQuillan

Introduction

Since the publication of the “Rose Report” (Rose, 2006) more than a decade ago, schools in England have been required to use “synthetic phonics” (Wyse & Goswani, 2008) to teach children to read. Three economists (Machin, McNally, & Viarengo, 2016) analyzed data from a large sample of English schoolchildren to assess the effectiveness of a phonics pilot program and early implementation of the new phonics mandate. They concluded that synthetic phonics was indeed effective compared to other methods of teaching reading in “closing the gaps” between students who “start out with disadvantages . . . compared to others” (p. 3).

I argue that the researchers’ results do not support this claim. Both experimental studies and Machin et al.’s analysis show that phonics instruction has a modest effect on initial literacy levels, but little to no impact on reading achievement in later grades.

Analysis: “Teaching to Teach” Literacy

Machin and colleagues examined the test scores of two different cohorts, each of which contained a group of students who were taught reading with synthetic phonics and a group who was not. Machin et al. gave no information on the instruction teachers used with the comparison groups, which could have also included phonics or some “mixed” approach.

The first cohort included students who were part of a phonics pilot study (Early Reading Development Pilot (ERDp)) prior to the release of the Rose Report in 2006. The second cohort (named “CLDD” for “Communication, Language, and Literacy Development Programme”) was formed by the first wave of Local Education Authorities that adopted the nationwide phonics curriculum announced in 2007. These early-adopter CLDD schools were compared to those that adopted the curriculum two years later (p. 11).

Students were assessed at three different points:

“Foundation Stage” (at age 5, after one year of instruction),

“Key Stage 1” (at age 7), and

“Key Stage 2” (at age 11).

The Foundation Stage Profile (now called the “Early Years Foundation Stage Profile”) was done by the child’s teacher, with 13 scales on which students were rated (Qualifications and Curriculum Authority, 2008). Four of these scales related to “communication, language, and literacy,” one of which was “linking sounds and letters.” The Key Stage 1 assessment was also done by the child’s teacher, while Key Stage 2 was scored externally.

Effect Size Comparisons

Machin et al. reported the effect size comparisons for the phonics groups and the non-phonics groups. Effect sizes allow us to see the magnitude of the impact an intervention has on test scores in a “standardized” unit, which is normally reported as the number (or fraction thereof) of standard deviations that separate the two groups (Hunter & Schmidt, 2004). This is especially important in studies such as this one, where the very large sample sizes (approaching 500,000) can cause even small score differences to be statistically significant. Effect sizes help us determine whether those differences are meaningful in practice.

Opinions differ on how to interpret effect sizes, but one widely used rule of thumb is that proposed by Cohen (1988): an effect size of .20 is “small,” .50 is “medium,” and .80 is “large.” In a review of more than 300 meta-analyses, Lipsey and Wilson (1993) found that these designations roughly corresponded to the distribution of effect sizes across a broad range of behavioral research. The U.S. Department of Education’s What Works Clearinghouse Handbook (NCES, 2014) recommended that effect sizes should be at least .25 to be considered “substantively important” for education research (p. 23).

We can also determine the practical importance of an effect size by considering what Plonksy and Oswald (2014) referred to as its “literal” or “mathematical” interpretation. If an intervention has an effect size of .10, a student in the treatment group who began at the 50th percentile would move to the 54th percentile, a difference of only four percentile points (assuming a normal distribution).1 But an effect size of .80 would move that same student to the 79th percentile, a much more substantial gain of 29 percentile points.

Hill, Bloom, Black, and Lipsey (2008) proposed additional benchmarks for effect size interpretation, including comparing the effect to “normative expectations for change” and to effect sizes for similar interventions. They reported that on norm-referenced, standardized reading tests, the annual mean gain in effect size is 1.52 from Kindergarten to Grade 1, 0.97 from Grade 1 to Grade 2, and 0.60 from Grade 2 to Grade 3 (Table 1, p. 173). Lipsey et al. (2012) analyzed the effect sizes from 124 studies of a variety of educational interventions and found that the average effect size was .28 for elementary school, .33 for middle school, and .23 for high school (Table 9, p. 34).

Using Hill et al.’s standards, then, an effect size of .10 should be considered small compared to year-to-year gains at this age, and would be below average compared to other teaching innovations.

Synthetic Phonics Cohort Groups

I summarize Machin et al.’s effect size data in Table 1, broken down by student characteristics, including those who have English as an Additional Language (EAL), children who were eligible for “free meals,” and both EAL and “free meals” students (all taken from their Table 6, p. 32).2 A positive effect size indicates that the phonics students scored higher than the non-phonics students.

Table 1

Effect Size Differences for Phonics and Non-Phonics Students in Machin et al. (2016)

Cohort 1

Cohort 2

Age 5
(Foundation Stage)

Age 7
(Key Stage 1)

Age 11
(Key Stage 2)

Age 5
(Foundation Stage)

Age 7
(Key Stage 1)

Age 11
(Key Stage 2)

Native Speakers

.225

.052

-.045

.211

.061

.001

EALs

.567

.134

.045

.201

.113

.068

Non-Free School Meals

.306

.042

-.061

.182

.104

.042

Free School Meals

.290

.135

.064

.207

.136

.062

EALs & Free School Meals

.300

.216

.181

.221

.195

.099

There are two conclusions we can draw from these data, both consistent with previous, more rigorously designed studies.

Phonics Teaching Has a Moderate Impact Initially

Phonics training has a moderate impact during the first year of instruction. Effect sizes in favor of phonics were mostly in the small-to-medium range (.20 to .50) at Age 5, although in Cohort 1, the effect size for EAL students was larger (.567). Effect sizes were also somewhat higher for the pilot study schools (Cohort 1) than for the first wave of schools that implemented the new phonics methods after the Rose Report (Cohort 2).

We have no way of knowing how instruction affected children’s comprehension scores versus their performance on decoding tasks, since these are combined into a single measure. The distinction is important: experimental research has found that phonics instruction boosts scores on isolated “skills” tests, but has a much smaller impact on measures of reading comprehension (Krashen, 2009).

In Table 2, I report the effect sizes for different measures of reading from three research reviews of phonics instruction, summarizing the results of 38 comparison studies from Ehri, Nunes, Stahl, and Willows (2001), 12 studies from Torgerson, Brooks, and Hall (2006), and 11 studies from a more recent analysis, a Cochrane Systematic Review by McArthur et al. (2011). Note that some studies were used in more than one meta-analysis.

Table 2

Impact of Phonics Instruction on Literacy Assessments in Three Meta-Analyses

Phonics instruction has a medium-to-large impact on reading isolated words and pseudo-word words in all three analyses, ranging from .38 to .76. But on reading comprehension tests, the effects are much smaller: .27 in Ehri et al., .24 in Torgerson et al., and .14 in McArthur et al.’s review. The estimates from Torgerson et al. and McArthur et al. were not significantly different from zero.

Any adequate analysis of the effectiveness of synthetic phonics, then, must examine the impact of instruction on comprehension tests as opposed to decoding measures.3 Machin et al.’s analysis does not.

The Effect of Phonics Instruction Fades Quickly

As can be seen in Table 1, by age 7, when most students are probably reading independently, the difference between the children taught synthetic phonics and the controls declines sharply for all groups, and is less than .20 for all comparisons except EAL + free meals group (and for Cohort 1 only: .216). By Age 11, only one of the 10 comparisons shown in Table 1 is even greater than .10, and all are under .20.4

Again, experimental studies confirm these results. The impact of phonics instruction, even on tests of phonological knowledge, tends to decline soon after the intervention is over. Suggate (2016) conducted a meta-analysis of 16 experimental studies on the long-term impact of phonics instruction. I summarize his findings in Table 3. The “Post-Test” effect size is the difference immediately after phonics instruction, and the “Delayed Post-Test” effect size is for the follow-up assessment. On average the delayed post-tests were given less than a year after the intervention ended (mean = 11.17 months). “Pre-reading” included “sub-skills” such as phonemic/phonological awareness, letter naming, and decoding. “Reading skills” included word identification, oral reading fluency, and reading accuracy scores.

Table 3

The Impact of Phonics Instruction on Immediate and Delayed Post-Test Measures of Literacy in Suggate (2016)

Assessment

Post-Test

Delayed Post-Test

Overall

.29

.07

Reading Skills

.26

.07

Pre-Reading

.32

.08

Comprehension

.47

-.10

It’s clear that, when measured in controlled experiments, the impact of phonics instruction all but disappears a year or so after it is introduced, even for those tests that measure phonological awareness and decoding. The impact on reading comprehension was actually negative on the delayed post-tests, but all the effect sizes were small.

A more recent, large-scale evaluation (N = 4,500) of the Open Court reading curriculum, which includes intensive phonics instruction, similarly found no positive effects for phonics on reading scores after the first year of instruction (Vaden-Kiernan et al., 2018).

Nor is experimental data on phonics instruction more favorable if we extend the follow-up to beyond one year. Blachman, Schatschneider, Fletcher, Murray, Munger, and Vaughn (2014) (not included in Suggate’s meta-analysis) compared the reading scores of a group of students who received intensive phonics training in early elementary school to a control group more than a decade after the intervention. There were no significant differences on any of the reading comprehension measures used, even ones biased toward decoding skills, such as the Woodcock-Johnson. The study’s authors did find significant differences for a few of the decoding measures, but for all other measures, including spelling, the effect sizes were “small to negligible” (p. 53).

Conclusion: Policy Should Be Based on Best Evidence

Machin et al.’s analysis on synthetic phonics was based on a “natural experiment,” allowing them to use a very large dataset with two separate cohorts. Natural experiments can provide confirmation of other findings, and are especially useful if you want to study a phenomenon where actual experimental design is not possible.

But that is not the case for teaching methods, about which we can in fact conduct true experiments. We already have several well-designed experiments on the effects of phonics instruction. Policy decisions should be made on the strongest evidence we have, not the weakest (Garan, 2001; 2004).

In any case, the results of the “Teaching to Teach” Literacy study do not support the assertion that synthetic phonics is having a positive impact on the reading scores of primary schoolchildren in England. The evidence Machin et al. presented is consistent with experimental studies that have found intensive phonics instruction makes a modest initial impact, but has very small effects on reading achievement later on.

Footnotes

1 An effect size of .10 that was cumulative over time might be considered substantial, however (Coe, 2002).

2 The researchers use the term “non-native speakers of English,” but it appears that they are referring to data on EALs, a broader designation that includes English language learners and native bilinguals (p. 16).

3 Even the small effect sizes for reading comprehension found in all three meta-analyses are likely overstated, since several of the phonics studies included used “comprehension” assessments that are strongly influenced by decoding ability (see Keenan, Betjemann, & Olson, 2008; and Spooner, Baddeley, & Gathercole, 2004). Of the seven (combined) studies used by McArthur et al. (2011) and Torgerson et al. (2006) to estimate their reading comprehension effect sizes, only two used reading tests that are not strongly affected by decoding skills. Both of those effects were small and non-significant (Ford, 2009), reported in McArthur et al (2011): -.15, for the Gates-MacGinitie test; Lovett et al. (1989), reported in Torgersen et al (2006): .08, for the Gilmore Oral Reading Test (incorrectly listed as the “Gray” Oral Reading Test (p. 61)).

4 A separate analysis of the impact of the Phonics Screening Check (PSC) implemented in English schools in 2012 (Walker, Sainsbury, Worth, Bamforth, & Betts, 2015) found that while the number of children scoring higher on the PSC had gone up (a 16% gain in two years), scores on the Key Stage 1 barely budged, with an effect size of .08 that the researchers correctly described as “not very big” (p. 27). This result is consistent with the general trend seen in experimental studies, that phonics has small, initial impact on phonics tests, but little impact on reading scores later on.

Coe, R. (2002). It’s the effect size, stupid: What effect size is and why it is important. Paper presented at the Annual Conference of the British Educational Research Association, University of Exeter, England, 12-14 September 2002. Retrieved from: http://www.leeds.ac.uk/educol/documents/00002182.htm

Torgerson, C., Brooks, G., & Hall, J. (2006). A Systematic Review of the Research Literature on the Use of Phonics in the Teaching of Reading and Spelling. Nottingham, England: Department for Education and Skills (DfES) Publications.