Test-enhanced learning: Using retrieval practice to help students learn

What is “test-enhanced learning”?

In essence, test-enhanced learning is the idea that the process of remembering concepts or facts—retrieving them from memory—increases long-term retention of those concepts or facts. This idea, also known as the testing effect, rests on myriad studies examining the ability of various types of “tests”—prompts to promote retrieval—to promote learning when compared to studying. It is one of the most consistent findings in cognitive psychology (Roediger and Butler 2011; Roediger and Pyc 2012).

In some ways, the terms “test-enhanced learning” and the “testing effect” are misnomers, in that the use of the word “tests” calls up notions of high-stakes summative assessments. In fact, most or all studies elucidating the testing effect examine the impact of low-stakes retrieval practice on a delayed summative assessment. The “testing” that actually enhances learning is the low-stakes retrieval practice that accompanies study in these experiments.

With that caveat in mind, the testing effect can be a powerful tool to add to instructors’ teaching tool kits—and students’ learning tool kits.

In this teaching guide, we provide six observations about the effects of testing from the cognitive psychology literature, summarizing one or two key studies that led to each of these conclusions. We have chosen studies performed with undergraduates learning educationally relevant materials (e.g., text passages as opposed to word pairs). We also suggest ways to implement test-enhanced learning in your class as well as important caveats to keep in mind.

Six things research tells us about the effects of retrieval practice

The idea that active retrieval of information from memory improves memory is not a new one: William James proposed this idea in 1890, and Edwina Abbott and Arthur Gates provided support for this idea in the early part of the 20th century (James, 1890; Abbott, 1909; Gates, 1917). During the last decade, however, evidence of the benefits of testing has mounted.

In one influential study, Roediger and Karpicke investigated the effects of single versus multiple testing events on long-term retention using educationally relevant conditions (Roediger and Karpicke, 2006). Their goal was to determine if any connection existed between the number of times students were tested and the size of the testing effect. The investigators worked with undergraduates in a laboratory environment,

asking them to read passages about 250 words long. The authors compared three conditions

(see Figure 1): students who studied the passages four times for five minutes each (SSSS group);

students who studied the passages three times and completed one recall test in which they were

given a blank sheet of paper and asked to recall as much of the passage as they could (SSST

group); students who studied the passages one time and then performed the recall practice

three times (STTT group). Student retention was then tested either five minutes or one week

later using the same type of recall test used for retrieval practice.

Interestingly, results differed significantly depending on when the final test was performed.

Students who took their final test very soon after their study period (i.e., 5 minutes) benefited

from repeated studying, with the SSSS group performing best, the SSST group performing

second-best, and the STTT group performing least well. This result suggests that studying is

more effective when the information being learned is only needed for a short time. However,

when the long-term retention is the goal, testing is more effective. The researchers found that

when the final test was delayed by a week the results were reversed, with the STTT group

performing about 5% higher than the SSST group and about 21% higher than the SSSS group.

Testing had a greater impact on long-term retention than did repeated study, and the participants

who were repeatedly tested had increased retention over those who were only tested once.

Student retention was then tested either five minutes or one week later using the same type of recall

test used for retrieval practice.The study described here is one of many making up a rich literature on

the testing effect; several recent review articles provide a thorough overview of the work in this area

resulted in better final performance than did immediate feedback, although both conditions showed benefit

over no feedback.

4. Learning is not limited to rote memory.

One concern that instructors may have with regard to using testing as a teaching and learning strategy is that it may promote rote memory. While most instructors recognize that memory plays a role in allowing students to perform well within their academic domain, they want their students to be able to do more than simply remember and understand facts, but instead to achieve higher cognitive outcomes (Bloom, 1956). Some studies address this concern and report results suggesting that testing provides benefits beyond improving simple recall. For example, the study by Smith and Karpicke (2014) described above determined the effects of testing on students’ recall of specific facts from reading passages as well as their ability to answer questions that required inference. In these studies, the authors defined inference as drawing conclusions that were not directly stated within the passages but that could be drawn by synthesizing from multiple facts within the passage. The investigators observed that testing following reading improved students’ ability to answer both types of questions on a delayed test, thereby providing evidence that benefits of testing are not limited to answers that require only rote memory.

Karpicke and Blunt sought to directly address the question of whether retrieval practice can

promote students’ performance on higher order cognitive activities in a 2011 study. They

investigated the impact of retrieval practice on students’ learning of undergraduate-level science

concepts, comparing the effects of retrieval practice to the elaborative study technique, concept

mapping (Karpicke and Blunt, 2011). In one experiment, students studied a science text and

were then divided into one of four conditions: a study-once condition, in which they did not

interact further with the concepts in the text; a repeated study condition, in which they studied

the text four additional times; an elaborative study condition, in which they studied the text one

additional time, were trained on concept mapping, and produced a concept map of the concepts

in the text; a retrieval practice condition, in which they completed a free recall test, followed by an additional study period and recall test (see Figure 6). All students were asked to complete a self-assessment predicting their recall within one week; students in the repeated study group predicted better recall than students in any of the other groups. Students then returned a week later for a short-answer test consisting of questions that could be answered verbatim from the text and questions that required inferences from the text.

Students in the retrieval practice condition performed significantly better on both the verbatim questions

and the inference questions than students in any other group. The authors then asked whether these

results would whether the advantage of retrieval practice would persist if the final test consisted of a con-

formance than did elaborative study using concept mapping on both types of final tests (short-answer and

concept mapping).When they examined the effects on individual learners, they found that 84% (101/120)

students performed better on the final tests when they used retrieval practice as a study strategy rather

than concept mapping.

5. Testing can potentiate further study.

Wissman, Rawson, and Pyc have reported work that suggests that retrieval practice over one set of

material may facilitate learning of later material, which may be related or unrelated (Wissman, Rawson,

and Pyc, 2011). Specifically, they investigated the use of “interim tests.” Undergraduate students were

asked to read three sections of a text. In the “interim test” group, they were tested after reading each

of the first two sections, specifically by typing everything they could remember about the text. After com-

pleting the interim test, they were advanced to the next section of material. The “no interim test” group

read all three sections with no tests in between. Both groups were tested on Section 3 after reading it.

Interestingly, the group that had completed interim tests on Sections 1 and 2 recalled about twice as many

"idea units” from Section 3 as the students who did not take interim tests. This result was observed both

when Sections 1, 2, and 3 were about different topics and when they were about related topics. Thus

testing may have benefits that extend beyond the target material.

6. The benefits of testing appear to extend to the classroom.

All of the reports described above focused on experiments performed in a laboratory setting. In addition, there are several studies that suggest the benefits of testing may also extend to the classroom.

In 2002, Leeming used an “exam-a-day” approach to teaching an introductory psychology course (Leeming, 2002). He found that students who completed an exam every day rather than exams that covered large blocks of material scored significantly higher on a retention test administered at the end of the semester.

Larsen, Butler, and Roediger asked whether a testing effect was observed for medical residents’ learning about status epilepticus and myasthenia gravis, two neurological disorders, at a didactic conference (Larsen et al., 2009). Specifically, residents participated in an interactive teaching session on the two topics and then were randomly divided into two groups. One group studied a review sheet on myasthenia gravis and took a test on status epilepticus, while the other group took a test on myasthenia gravis and studied a review sheet on status epilepticus. Six months later, the residents completed a test on both topics. The authors observed that the testing condition produced final test scores that averaged 13% higher than the study condition.

Lyle and Crawford examined the effects of retrieval practice on student learning in undergraduate statistics class (Lyle and Crawford, 2011). In one section of the course, students were instructed to spend the final 5 to 10 minutes of each class period answering two to four questions that required them to retrieve information about the day’s lecture from memory. The students in this section of the course performed about 8% higher on exams over the course of the semester than students in sections that did not use the retrieval practice method, a statistically significant difference.

Other classroom studies have been published by McDaniel, Wildman, and Anderson (2012), Orr and Foster (2013), and Stanger-Hall and colleagues (2011).

Why is it effective?

Several hypotheses have been proposed to explain the effects of testing. The retrieval effort hypothesis suggests that the effort involved in retrieval provides testing benefits (Gardiner, Craik, and Bleasdale, 1973). This hypothesis predicts that tests that require production of an answer, rather than recognition of an answer, would provide greater benefit, a result that has been observed in some studies (Butler and Roediger, 2007; Pyc and Rawson, 2009) but not others (Little and Bjork, 2012; some experiments in Smith and Karpicke, 2014; some experiments in Kang, McDermott, and Roediger 2007).

Bjork and Bjork’s new theory of disuse provides an alternative hypothesis to explain the benefits of testing (Bjork and Bjork, 1992). This theory posits that memory has two components: storage strength and retrieval strength. Retrieval events improve storage strength, enhancing overall memory, and the effects are most pronounced at the point of forgetting—that is, retrieval at the point of forgetting has a greater impact on memory than repeated retrieval when retrieval strength is high. This theory aligns with experiments that demonstrate that study is as or more effective as testing when the delay before a final test is very short (see, for example, Roediger and Karpicke 2006), because the very short delay between study and the final test means that retrieval strength is very high—an experience many students can verify from their own experience cramming. At a greater delay, however, experiences that build retrieval strength (e.g., testing) confer greater benefit than studying.

How can instructors implement test-enhanced learning in their classes?

There are many ways to take advantage of the testing effect, some during class time and some outside of class time. The following are a few suggestions.

Incorporating frequent quizzes into a class’s structure may promote student learning. These quizzes can consist of short-answer or multiple-choice questions, and can be administered online or face-to-face. Studies investigating the testing effect suggest that providing students the opportunity for retrieval practice—and ideally, providing feedback for the responses—will increase learning of targeted as well as related material.

Providing “summary points” during a class to encourage students to recall and articulate key elements of the class. Lyle and Crawford’s study examined the effects of asking to students to write the main points of the day’s class during the last few minutes of a class meeting, and observed a significant effect on student recall at the end of the semester (Lyle and Crawford, 2011). Setting aside the last few minutes of a class to ask students to recall, articulate, and organize their memory of the content of the day’s class may provide significant benefits to their later memory of these topics.

Pretesting to highlight important information and instructor expectations. Elizabeth Ligon Bjork and colleagues have reported results that suggest that pretesting students’ knowledge of a subject may prime them for learning (Little and Bjork, 2011). By pretesting students prior to a unit or even a day of instruction, an instructor may help alert students both to the types of questions that they need to be able to answer as well as the key concepts and facts they need to be alert to during study and instruction.

Telling students about the testing effect. Instructors may be able to aid their students’ metacognitive abilities by sharing a synopsis of these observations. Telling students that frequent quizzing helps learning—and that effective quizzing can take a variety of forms—can give them a particularly helpful tool to add to their learning toolkit (Stanger-Hall et al., 2011). Adding the potential benefits of pretesting may further empower students to take control of their own learning, such as by using example exams as primers for their learning rather than simply as pre-exam checks on their knowledge.

This list is a starting point. Instructors should use the principles that underlie test-enhanced learning—frequent low-stakes opportunities for students to practice recall—to develop approaches that are well-adapted for their class and context.

What are important caveats to keep in mind?

Keep it low-stakes. The term “testing” evokes a certain response from most of us: the

person being tested is being evaluated on his or her knowledge or understanding of a

particular area, and will be judged right or wrong, adequate or inadequate based on the

performance given. This implicit definition does not reflect the settings in which the bene-

fits of “test-enhanced learning” have been established. In the experiments done in cog-

nitive science laboratories, the “testing” was simply a learning activity for the students; in

the language of the classroom, it could be considered a “no-stakes” formative assessment

where students could evaluate their memory of a particular subject. In most of the studies

Orr and Foster, 2013). Thus, the term retrieval practice may be a more accurate description

of the activity that promoted students’ learning. Implementing approaches to test-

enhanced learning in a class should therefore involve no-stakes or low-stakes scenarios in

which students are engaged in a recall activity to promote their learning rather than being

repeatedly subjected to high-stakes testing situations.

Share your learning objectives so that students understand their targets. It’s important to note that incorporating testing—or recall practice—as a learning tool in a class should be done in conjunction with other evidence-based teaching practices, such as sharing learning objectives with students, carefully aligning learning objectives with assessments and learning activities, and offering opportunities to practice important skills. If you want students to be able apply their knowledge, analyze complex situations, and synthesize different points of view, be sure to let them know that retrieval practice will help them learn the basic information they need for these skills—but that retrieval alone is not sufficient.

References

Abbott EE (1909). On the analysis of the factors of recall in the learning process. Psychological Monographs, 11, 159-177.

McDaniel MA, Wildman KM, and Anderson JL (2012). Using quizzes to enhance summative-assessment performance in a web-based class: An experimental study. Journal of Applied Research in Memory and Cognition 1, 18-26.