Abstract

Working memory (WM) is the ability to maintain and manipulate task-relevant information in the absence of sensory input. While its improvement through training is of great interest, the degree to which WM training transfers to untrained WM tasks (near transfer) and other untrained cognitive skills (far transfer) remains debated and the mechanism(s) underlying transfer are unclear. Here we hypothesized that a critical feature of dual n-back training is its reliance on maintaining relational information in WM. In experiment 1, using an individual differences approach, we found evidence that performance on an n-back task was predicted by performance on a measure of relational WM (i.e., WM for vertical spatial relationships independent of absolute spatial locations), whereas the same was not true for a complex span WM task. In experiment 2, we tested the idea that reliance on relational WM is critical to produce transfer from n-back but not complex span task training. Participants completed adaptive training on either a dual n-back task, a symmetry span task, or on a non-WM active control task. We found evidence of near transfer for the dual n-back group; however, far transfer to a measure of fluid intelligence did not emerge. Recording EEG during a separate WM transfer task, we examined group-specific, training-related changes in alpha power, which are proposed to be sensitive to WM demands and top-down modulation of WM. Results indicated that the dual n-back group showed significantly greater frontal alpha power after training compared to before training, more so than both other groups. However, we found no evidence of improvement on measures of relational WM for the dual n-back group, suggesting that near transfer may not be dependent on relational WM. These results suggest that dual n-back and complex span task training may differ in their effectiveness to elicit near transfer as well as in the underlying neural changes they facilitate.

Electronic supplementary material

In the past 15 years, a body of research on the effects of cognitive training has amassed. The overarching goal of cognitive training is to improve specific or broad-ranging cognitive skills, and numerous approaches have been examined with this goal in mind, such as video game training (Green et al. 2016; Powers et al. 2013), working memory (WM) training (Au et al. 2015; Morrison and Chein 2011; Shipstead et al. 2012), meditation training (Chiesa et al. 2011; Lutz et al. 2008; Tang et al. 2015), non-invasive brain stimulation (Berryhill et al. 2014; Parkin et al. 2015), aerobic exercise (Hillman et al. 2008), and many others (for a comprehensive review see, Simons et al. 2016). Of these approaches, WM training has, arguably, received the most attention and investigation, but also been surrounded by the most controversy.

The seminal study by Jaeggi et al. (2008) demonstrated that training on a WM task yielded improvement on a measure of fluid intelligence (Gf). While this result has been replicated many times (e.g., Jaeggi et al. 2014; Jausovec and Jausovec 2012; Rudebeck et al. 2012; Stephenson and Halpern 2013; for a meta-analysis see Au et al. 2015), others have failed to replicate this effect (e.g., Redick et al. 2013; Thompson et al. 2013; for a review see Shipstead et al. 2012). Specifically, this area of investigation has focused on “far” transfer, whereby transfer represents the degree of similarity between the trained task (e.g., dual n-back) and untrained tasks (e.g., matrix reasoning tasks). Some of the debate in the WM training literature has surrounded the methods used to draw conclusions about transfer. For example, many early WM training studies, like Jaeggi et al. (2008), utilized passive control groups that did not engage in any training and were simply tested on the transfer measures twice. It is now widely accepted that making any firm conclusion about transfer effects requires an active control group, which controls for expectations, contact with experimenters, and other factors (see Simons et al. 2016 for a review of best practices). Another early criticism of the cognitive training field was the issue of inadequate baselines, whereby transfer might be concluded in the face of the control group performing worse after training (for a detailed consideration of this issue see, Boot et al. 2013). Furthermore, a critical feature of effective training that has evolved with the field is the use of adaptive training programs. For example, Lövdén et al. (2010) argue that a fundamental prerequisite for successful cognitive training is a mismatch or imbalance between environmental demands and actual brain supply. To create this prolonged mismatch, training tasks must be challenging but manageable with a high degree of effort. This balance between keeping a task difficult enough to ensure the participant is not bored while easy enough to ensure the participant does not get frustrated is typically achieved through adaptive paradigms that keep the effort and feasibility level consistent for each individual participant.

Further, a source of heterogeneity in this literature that may account for some of the inconsistent results is the diverse set of WM tasks that have been used for training purposes. The most commonly used training task is a variant of the single or dual n-back task, but others have used complex span WM tasks (Kane et al. 2004; Redick et al. 2012), or commercialized programs like Cogmed (e.g., Holmes et al. 2009; Klingberg et al. 2005). Indeed, in our review of the literature, we assessed that far transfer to Gf was present most often in response to dual n-back training (Colom et al. 2013; Jaeggi et al. 2008, 2010b, 2014; Jausovec and Jausovec 2012), but there is no evidence of far transfer to Gf from complex span tasks (Chein and Morrison 2010; Richmond et al. 2011). While there are examples of dual n-back training not resulting in far transfer to Gf, this type of training has shown the most promise in effectively improving Gf with training. Therefore, we sought to identify the “active ingredient” in the dual n-back that may make it more efficacious. Specifically, we aimed to directly compare two of the most commonly used tasks for both assessment and training of WM, namely an n-back task and a complex span task. N-back and complex span tasks, while both used to measure WM, have been shown to be only weakly correlated with one another (for a meta-analysis see, Redick and Lindsey 2013). To the best of our knowledge, no previous study has directly compared n-back and complex span WM training in the same study.

Here we hypothesized that a critical difference between n-back and complex span tasks is their reliance on maintaining and updating relational information in WM (Jonides et al. 1997; Miyake et al. 2000). Matrix reasoning tasks that are typically used to assess Gf also involve extracting relational information between stimuli (Carpenter et al. 1990). Previous work from our group has demonstrated a neural dissociation between maintaining concrete and relational information in WM (Ackerman and Courtney 2012; Blacker and Courtney 2016; Blacker et al. 2016; Ikkai et al. 2014). Specifically, these studies have shown that maintaining a concrete piece of sensory information, such as a spatial location, is supported by distinct neural substrates as compared to maintaining a spatial relationship that is independent of the original sensory location. Other groups have also shown dissociable neural substrates for processing or maintaining relational information from other types of concrete stimuli (Badre 2008; Bahlmann et al. 2014; Libby et al. 2014). Based on this previous work, we hypothesized that the n-back task relies more on relational information than a complex span task does due to the constant need for spatiotemporal updating of information. It can be reasoned that n-back tasks may involve multiple forms of relational information depending on the task stimuli. For example, in a spatial n-back task, participants are asked to remember both the spatial relationship between a current and previously presented item, but also must remember the “temporal” relationship with respect to how many items have passed between any given two stimuli. While most n-back tasks likely involve some form of relational information, here we focused on visuospatial versions that seem to have multiple relations to maintain.

We first tested this hypothesis in a cross-sectional study examining individual differences in WM for spatial relations versus locations and performance on n-back and complex span tasks, as well as a measure of Gf. Next, we sought to test whether this differential reliance on relational information could explain the differences in training efficacy found between n-back and complex span tasks in the literature by examining near and far transfer effects in groups trained on either one task or the other. Further, the current study aimed to investigate the neural underpinnings of effective WM training by comparing the effects of these two training tasks on neural activity during performance of a spatial relation and location WM task using EEG. In particular, we hypothesized that n-back training would have a greater effect on neural activity during relational WM than would complex span training.

In particular, we examined alpha power (8–13 Hz) because a body of research has emerged suggesting oscillations in this range are involved in the maintenance of information in WM and have been linked to individual differences in WM performance (for a review see, Roux and Uhlhaas 2013). Alpha band oscillations have been observed in a number of sensory modalities during WM maintenance (Haegens et al. 2010; Herrmann et al. 2004; Kaiser et al. 2007). For example, it has recently been demonstrated that the topography of alpha tracks the spatial location of to-be-remembered items in WM (Foster et al. 2016). The majority of work has focused on alpha over posterior brain regions, which are thought to reflect the inhibition of task-irrelevant brain regions in an effort to prioritize processing of task-relevant information (Jensen et al. 2002; Jokisch and Jensen 2007; Kelly et al. 2006; Klimesch et al. 2007). Moreover, WM performance has also been linked to alpha over frontal brain regions (Sadaghiani et al. 2012; Sauseng et al. 2005a, b). Specifically, there is evidence that synchronization of alpha oscillations in posterior and frontal brain regions serves to protect items currently held in WM (Bonnefond and Jensen 2012). In fact, Zanto et al. (2011) found that modulating this anterior-posterior alpha synchronization via rapid transcranial magnetic stimulation (rTMS) impaired WM performance, which suggests a causal role for prefrontal cortex (PFC) in modulating posterior brain regions. Taken together, this previous evidence suggests that alpha may represent a putative mechanism by which WM training may exert its effect on enhanced performance.

More specifically, in two related studies, our group has found that when participants maintain a spatial relation in WM, compared to a spatial location, there is an increase in posterior alpha power (Blacker et al. 2016; Ikkai et al. 2014). This increase in posterior alpha power has been interpreted as representing suppression of sensory brain regions because the sensory information (i.e., the spatial locations) is no longer task-relevant. In addition to differences in posterior alpha, our previous work has also shown that there is greater frontal alpha power when a relationship is being maintained in WM compared to a location, as well as increased frontal-posterior phase synchrony (Blacker et al. 2016). Therefore, if dual n-back and complex span differentially rely on relational WM, and if dual n-back training strengthens relational WM, then changes in alpha power may represent a neural marker of improvement that would be greater for dual n-back training. Thus, in the current study, we investigated changes in alpha power during this same type of spatial relations and locations WM tasks before and after dual n-back and complex span WM training.

Here we describe the results of two experiments. Experiment 1 is a behavioral study testing our hypothesis that the n-back task relies more on relational WM as compared to complex span tasks. The results of that study supported the idea that the reliance on relational processing in the n-back task could account for differences in transfer following training that have been reported in the literature. In experiment 2, we aimed to directly compare the efficacy of dual n-back and complex span training to elicit near and far transfer and we aimed to investigate a potential neural basis of these training-induced changes in WM performance: changes in alpha power as measured by EEG.

Tasks and Procedures

Experimental stimuli were controlled by MATLAB (The MathWorks, Natick, MA) using Psychophysics Toolbox extensions (Brainard 1997; Pelli 1997), and displayed on a 21″ Apple iMac monitor. Participants were seated approximately 60 cm from the monitor.

Participants completed one session, which included performance of the following tasks in one of two orders, counterbalanced across participants: Order (1) Spatial Locations and Relations WM task, Symmetry Span, single spatial n-back task, and BOMAT; Order (2) BOMAT, Spatial Locations and Relations WM task, single spatial n-back task, Symmetry Span.

Symmetry Span

Participants completed the automated Symmetry Span task (Kane et al. 2004). Participants recalled sequences of 2–5 red square locations while performing an interleaved symmetry judgment task. The dependent measure for Symmetry Span was the partial score,1 which is the sum of red squares recalled in the correct location and serial order, regardless of whether the entire trial was recalled correctly. This score will henceforth be referred to as the Symmetry Span Score.

Spatial n-back

Participants were shown a sequence of spatial locations indicated by black squares and were asked to respond each time the current stimulus was in the same location as the one presented n positions back in the sequence. The black squares were presented in eight possible locations organized in an imaginary circle around fixation. The squares were presented on a gray background for 500 ms each, followed by a 2500 ms interstimulus interval (ISI). Participants were required to press the SPACE bar for targets, and their response window lasted from the onset of the stimulus until the presentation of the next stimulus (3000 ms); no response was required for non-targets. Participants were tested on 1-, 2-, 3-, 4-, and 5-back levels in that order. A level consisted of 20 + n stimuli and contained 6 targets and 14 + n non-targets each. For each n-level, we calculated d’ as a measure of sensitivity (d’ = Z(hit rate) – Z(false alarm rate)), which is a useful metric for an n-back task because it measures hit rate while penalizing for false alarms (e.g., Haatveit et al. 2010). We used the average d’ across all five levels as the dependent measure of performance.

Bochumer Matrices Test

We assessed fluid intelligence (Gf) using the Bochumer Matrices Test (BOMAT) (Hossiep et al. 1999). The BOMAT is a matrix reasoning test that contains 29 problems and has two full versions, A and B. All participants in experiment 1 completed version B. The standard time limit of 45 min was used here. The number of correct solutions was used as the dependent measure.

Spatial Locations and Relations Task

Participants also completed a novel WM task that required participants to either maintain spatial relations or spatial locations (Fig. 1). This task was modeled after the tasks used in previous studies (Ackerman and Courtney 2012; Blacker and Courtney 2016; Ikkai et al. 2014).

Trial examples for the Spatial Locations and Relations task. Under low load, Location trials required participants to imagine a line between two sample circles, hold the location of that line in memory across a delay and then decide if a test circle fell in that location or not. Under high load, Location trials required participants to maintain the locations of three circles in memory and then decide if a test circle fell in one of those locations or in a completely new location. Under low load, Relation trials required participants to maintain the vertical relationship (above/below) of two sample circles and then decide if two test circles were in the same relationship. Under high load, Relation trials required participants to maintain the three vertical relationships between three sample circles and then decide if one of those pairs were presented in the same relationship at test

All stimuli were presented on a 50% gray background. A trial began with a 500-ms fixation cross, presented in the middle of the screen. Next, a 500-ms verbal cue indicated whether the trial would be a “Relation” or a “Location” trial. A sample array was then presented for 500 ms, which contained two or three colored circles (each subtending 0.67° × 0.49° of visual angle). The color of each circle was chosen randomly without replacement from red, green, yellow, and blue. The circles in an array were presented between 2.0 and 2.9° of visual angle apart both horizontally and vertically, but within the same quadrant of visual space. After a 2000 ms delay period, a test array was displayed for 500 ms. Participants had a maximum of 1500 ms to enter their response, which meant that the 1000 ms following the offset of the test array was also considered the response period. Finally, a 300 ms feedback display was presented where the fixation cross turned green for a correct response, red for an incorrect response, and blue if the response was slower than 1500 ms.

For both trial types, there was a low load (i.e., sample array contained two colored circles) and a high load (i.e., sample array contained three colored circles) condition.

For Location trials (Fig. 1), under low load, participants were instructed to imagine a line segment connecting the two sample circles and maintain the location of that line in memory across the delay period. These instructions were used to encourage participants to encode the exact spatial coordinates of one concrete object (i.e., the imaginary line segment), while the same number of circles (i.e., two) was presented on the screen for both Location and Relation trials. At test, participants were asked to decide whether or not the black test circle fell on that imaginary line segment. For match trials, the black test circle fell in the exact center between the two previously presented sample circles (i.e., in the center of the remembered imaginary line segment). For non-match trials, the black test circle fell between 1.6 × 2.2° and 2.1 × 2.9° of visual angle from the position of that center point in any direction between the sample circles.

For Location trials, under high load, participants were instructed to remember the absolute locations of the three sample circles. At test, participants were asked to decide whether or not the black test circle was in one of the three sample locations or in a completely new location. For match trials, the black test circle fell in the identical location as one of the sample circles and for non-match trials it fell between 1.6 × 2.2° and 2.1 × 2.9° of visual angle from any of the three sample locations.

While the low and high load Location trials consisted of different instructions to the participant, the key factor is that under low load participants were asked to maintain one spatial location (i.e., the location of the imaginary line segment) and under high load participants were asked to maintain three spatial locations (i.e., the locations of the three sample circles). The imaginary line manipulation in the low load trials was necessary to equate the number of sample circles displayed for Location and Relation trials. Further, previous work using this task has shown that the load manipulation for Location trials activates brain regions that are typically found with load increases in other types of visuospatial WM (Blacker and Courtney 2016), such as posterior parietal cortex (e.g., Todd and Marois 2004) and frontal eye fields (e.g., Leung et al. 2004).

For Relation trials (Fig. 1), under low load, participants were instructed to encode and maintain the relative vertical positions of the two sample circles (e.g., red is above blue). Upon test, participants indicated whether or not the circles in the test array had the same relative vertical positions as the sample circles. For Relation trials, under high load, participants were instructed to encode and maintain the three possible vertical relationships between the sample circles (e.g., green is above yellow, yellow is above red, red is below green). As with low load, at test, participants indicated whether or not the circles in the test array had the same relative vertical positions as the sample circles. Note, which of the three relationships was tested was unpredictable, which forced participants to maintain all three relationships during the delay period. The horizontal relationship was never task-relevant in Relation trials.

There are a few crucial aspects of the task design worth elaborating on. First, regardless of trial type, under low load, participants were asked to encode and maintain one piece of information: either one spatial location (Location trials) or one spatial relation (Relation trials) and under high load, participants were asked to encode and maintain three pieces of information: either three spatial locations or three spatial relations. Second, trial type was pseudorandomly presented so participants could not predict what trial type they would see until the cue. Load was uncued, so the participants were unaware of the load until the sample array appeared. Third, the sample array circles were always presented in one quadrant of the display and the test array circles were always presented in the same quadrant as the sample circles, although in the Relation task the test circles were never in the exact same locations as the sample circles. For all trial types, participants pressed one button for a “match” response and another for a “non-match” response and these response key mappings were counterbalanced across participants. Participants completed a total of 256 trials.

Data Analysis

The goal of this initial experiment was to investigate individual differences in WM for spatial relations versus locations and how those differences predicted n-back, complex span, and Gf performance. To do so, we tested partial correlations between Relation WM accuracy, while controlling for Location WM accuracy, and n-back, Symmetry Span, and BOMAT. We also tested for partial correlations between Location WM accuracy, while controlling for Relation WM accuracy, and n-back, Symmetry Span, and BOMAT. For the Location and Relation task, we focused our analysis on high load trials, because previous work has shown that individual differences in WM performance are more evident when load is high (Cusack et al. 2009; Linke et al. 2011). However, we report parallel analyses for the low load condition in the Supplementary Online Material.

Given that Relation WM was significantly correlated with n-back and BOMAT performance, it would be expected that n-back and BOMAT performance would also be highly correlated. In line with previous work demonstrating the robust relationship between WM and Gf performance (e.g., Engle et al. 1999), here we did find a significant positive correlation between BOMAT score and n-back d’, R = 0.58, p = 0.003.

Discussion

The results of experiment 1 represent a proof of concept that performance on a relational WM task is predictive of performance on both an n-back task and a measure of Gf. This supports our initial hypothesis that the n-back task relies more heavily on relational processing than complex span tasks do. It also demonstrates that a measure of Gf, the BOMAT, is related to relational WM performance. Therefore, we next examined whether this reliance on relational WM could explain differences in transfer between these two tasks in a training study, as well as whether n-back and complex span training would result in significantly different changes in alpha power following training. If n-back training were to have a greater effect than complex span training on an EEG measure related to relational WM, then it would provide support for the idea that the involvement of relational WM might underlie its greater effectiveness for cognitive training and transfer.

Experiment 2: Training Study

Method

Participants

One hundred thirty-six neurologically healthy adults (age: M = 21.18, SD = 3.12; 44 males) participated for monetary compensation. All participants were recruited through flyers and online university announcements. Recruitment materials stated “Participants needed for research on cognitive training.” No participants from experiment 1 participated in the training study. Participants who completed the study received an average of $215 (SD = 11).2 All participants had normal or corrected-to-normal vision, and gave written informed consent approved by the Institutional Review Boards of both Johns Hopkins University and the Johns Hopkins Medical Institutions.

General Study Procedures

The general study structure is illustrated in Fig. 3. Upon enrollment in the study, participants were randomly assigned to a training group as noted above and then completed a 2.5 h pre-training assessment session. This initial session consisted of the informed consent, demographic questionnaire completion, training program installation onto their personal laptop, completion of six assessment tasks, instructions and practice trials for the EEG task, and instructions and practice trials for their respective training task. All of these tasks are described in detail below. The second session was a 2 h pre-training EEG session, which included completion of several questionnaires and a separate WM task while EEG was recorded. After the pre-training EEG session, participants were instructed to begin their at-home training sessions immediately and to complete five sessions per week for 4 weeks. Upon completion of the 20 training sessions, participants completed a 2 h post-training EEG session where they completed the same WM task while EEG was recorded. Finally, participants completed a 1.5 h post-training assessment session where they completed the six assessment tasks again and completed a study exit survey.

General study procedures including sample size and attrition rate by group

Training Groups and Tasks

Participants were randomly assigned to one of three training groups upon enrolling in the study: dual n-back training (DNBT), symmetry span training (SST), or permuted rule operations (PRO). During the initial study session, we installed MATLAB (The MathWorks, Natick, MA) and Psychtoolbox (Brainard 1997; Pelli 1997) onto each participant’s personal laptop, along with the scripts for their assigned training program. Each training session was set to last 30 min. Participants emailed their data files to the laboratory after every session.

Dual n-back Training

Similar to Jaeggi et al. (2008), one group of participants performed a dual auditory and visuospatial n-back task simultaneously (Fig. 4a). Participants saw a blue square on a gray background in one of eight spatial locations on a computer screen and at the same time heard one of eight letter sounds (i.e., consonants). These stimulus pairs were presented every 3 s. Participants were asked to judge whether each stimulus was in the same location or the same letter as the stimuli n before. Participants responded by pressing: the “f” key if the location was the same as the location n-back, the “j” key if the letter was the same as the letter n-back, and both “f” and “j” if both the location and letter were the same as n-back. Participants began training at the 1-back level. A sequence of 20 + n stimuli was presented in a given block, where there were four location-only targets, four letter-only targets, and two dual targets in each block. The training was adaptive, whereby participants were given feedback on their accuracy at the end of each block. If accuracy was > 85%, the next level would increase by 1 (i.e., a 2-back), whereas if accuracy was < 70%, the next level would decrease by 1 (or stay the same if they were on a 1-back), and in all other instances, the level would remain unchanged.3 While experiment 1 provided evidence of a relationship between single n-back performance and Gf and Relation WM, here we used a dual n-back task for training because it is more common in the literature to use dual n-back tasks for training purposes and there is evidence that dual and single n-back task performance are roughly equally correlated with performance on measures of Gf (Jaeggi et al. 2010a, b).

Task schematics for each of the three training tasks: a dual n-back training, b symmetry span training, and c permuted rule operations

Symmetry Span Training

An adaptive version of the automated symmetry span task (Kane et al. 2004) was used here, similar to that used by (Richmond et al. 2011). Participants recalled sequences of red square locations within a matrix against a background symmetry judgment task (Fig. 4b). The participant’s task was to remember the order and locations of red squares presented in a 4 × 4 grid. Each red square was presented for 650 ms. Interleaved between each red square, a white 8 × 8 matrix was presented, with some squares filled in black and participants were asked to decide if the image was symmetrical along its vertical axis. A total of 96 symmetry images were utilized (50% symmetrical, 50% asymmetrical). Participants responded by clicking a box labeled “yes” or “no” with the mouse. After all red squares and symmetry judgments were presented, a blank 4 × 4 grid was presented and the participant recalled the red square locations in the temporal order they were presented by clicking in the blank grid with the mouse. This training task was made adaptive in two ways. Training began with 2 memory items and 1 symmetry judgment following each memory item. The next level would contain 2 memory items and 2 symmetry judgments following each memory item, then 3 symmetry judgments. After 3 symmetry judgments, the number of memory items would increase to 3, but the symmetry judgments would be 1 following each memory item, then 3 memory items and 2 symmetry judgments and so on. If memory accuracy was > 85% and symmetry accuracy was > 75%, the next level would increase by 1, whereas if memory accuracy was < 70% or symmetry accuracy was < 60% the next level would decrease by 1 (or stay the same if they were on level 1), and in all other instances the level would remain unchanged.

Permuted Rule Operations

The PRO training group served as an active control group as the training did not involve a WM task, but was a cognitively challenging and adaptive training program. The task was adapted from the task used by Cole et al. (2010). Participants were presented a set of three rules and then asked to apply those rules to two word stimuli as quickly as possible (Fig. 4c). Four semantic rules, four decision rules, and four response rules were used. The semantic rule consisted of sensory semantic decisions (i.e., “is it sweet?”). The four possible semantic rules were as follows: SWEET, GREEN, SOFT, and LOUD. The decision rule specified (using logical relations) how to respond based on the semantic decision outcome(s) for each trial. The SAME rule required that the semantic answer was the same (“yes” and “yes” or “no” and “no”) for both words, the DIFFERENT rule required that the answer was different for the two words, the SECOND rule required that the answer was “yes” for the second word, while the NOT SECOND rule required that the answer was “no” for the second word. The motor response rules specified what button to press based on the decision outcome. The task instructions made explicit reference to the correct response for a “true” outcome, while participants knew (from the practice session) to use the other finger on the same hand for a “false” outcome. For all participants the “f” and “d” keys corresponded to left index and middle finger responses, respectively, whereas the “j” and “k” keys corresponded to the right index and middle finger responses, respectively. As shown in Fig. 4c as an example trial, if the participant was presented with the rules: SWEET, SAME, LEFT MIDDLE and the words were CHOCOLATE and GRASS then the participant would press the “f” key (i.e., the left index key) to indicate that the outcome was false (i.e., chocolate and grass are not the same on the attribute of sweetness).

As with DNBT and SST, the PRO training was adaptive. Participants started with only one combination of rules and with self-paced timing. Self-paced meant that participants could spend as much time as they needed encoding the rules, then they initiated the trial by pressing the space bar, which would bring up the two words they had to apply the rules to. In the beginning, participants had 1500 ms to respond. The next level would include a time-limited encoding duration of 1500 ms. As participants progressed through the levels the encoding time was shortened to 500 ms and then eventually to 300 ms. Once participants were able to perform the task with the 500 ms encoding duration, the number of rule combinations would increase. As all 64 of the rule combinations were mixed in, then the time in which the participant had to respond was shortened from 1500 ms to 1000 ms to 750 ms to 500 ms. These different manipulations all served to increase the difficulty of the task as participants progressed through the various levels. Participants performed 32 trials per block. If accuracy was > 85% in a given block, the next level would increase by 1, whereas if accuracy was < 60% the next level would decrease by 1 (or stay the same if they were on level 1), and in all other instances the level would remain unchanged.

Attrition

A total of 73 participants completed the study. Sixty-three participants failed to complete the study. Nine participants did not have a computer that would run the study-related software required for the at-home training sessions, which classified these participants as failures in pre-enrollment screening. Additional reasons for dropping out included a change in exclusion criteria status, loss of interest, being too busy with schoolwork, inability to perform above chance on the Relation/Location WM task, or simply ceasing to answer emails from the experimenters. A detailed breakdown of these reasons and the number of corresponding participants is listed in Table S1 in the Supplemental Online Material. Participants were equally likely to drop out from each of the 3 training groups (see Fig. 3). Considering individuals who dropped out either without contact or reason and those that did provide a reason for dropping out, we had an unusually high drop-out rate of 46 participants (34% of the initial sample). Given the large proportion of participants who did not provide us with a rationale for their withdrawal from the study, we can only speculate about the high attrition rate seen here. Our participants were students from the Johns Hopkins community and this population has a reputation for being very focused on coursework and academic outcomes. Our speculation is that many students opted to drop out of our time-intensive study in order to prioritize coursework.

Assessment Tasks and Procedures

Experimental stimuli were controlled by MATLAB (The MathWorks, Natick, MA) using Psychophysics Toolbox extensions (Brainard 1997; Pelli 1997), and displayed on a 21″ Apple iMac monitor. Participants were seated approximately 60 cm from the monitor.

Object n-back

Participants were shown a sequence of object stimuli and they had to respond each time the current stimulus was identical in shape to the one presented n positions back in the sequence (Fig. 5a). The stimuli consisted of 10 random shapes (Vanderplas and Garvin 1959), which have been used in an n-back task previously (Jaeggi et al. 2003). The shapes were presented in one of four colors and participants were told that color was irrelevant. The timing, n-levels used, and dependent measure were identical to that reported in experiment 1 for the spatial n-back.

Operation Span

We used the Operation Span (Ospan) task as a complex span measure of WM (Kane et al. 2004). As shown in Fig. 5b, the task requires participants to recall a sequence of letter stimuli in the correct order in addition to completing a distracting processing task. We presented three sets of stimuli per set size (i.e., number of stimuli to be recalled) and the set sizes ranged from 4 to 8. The processing task involved quickly responding to whether a math problem was accurate or not (e.g., 4 + 7 = 11). The score was measured as the sum of letters recalled correctly in the correct order, regardless of whether the entire trial was recalled correctly.

Task Switching

Participants completed a task switching paradigm (Fig. 5c) similar to that developed by (Rogers and Monsell 1995). We chose to include this task in our assessment battery because we predicted that it would represent a near transfer measure for the PRO training group, which trained on a rapid rule learning and switching paradigm. A trial began with a 1000 ms fixation period followed by a cue presented for 200 ms. The cue either read “LETTER” or “NUMBER.” The cue was followed by a 226 ms delay period and then an alphanumeric pair of stimuli centrally presented. If cued LETTER, participants had to respond whether the letter was a vowel or consonant (e.g., left button for vowel, right button for consonant). If cued NUMBER, participants had to respond whether the number was even or odd (e.g., left button for even, right button for odd). Response mapping was counterbalanced across participants. Participants were instructed to respond as quickly as possible without losing accuracy. After a response, a 950 ms intertrial interval (ITI) was presented. Response time (RT) was the dependent measure and was considered separately for repeat trials (i.e., trials where the previous trial was the same cue) and switch trials (i.e., trials where the previous trial was the other cue). Participants completed a total of 240 trials where 60% were repeat trials. The critical dependent measure was switch cost (switch RT–repeat RT).

BOMAT

Before and after training we assessed Gf using the BOMAT, as described in experiment 1. Here, all participants completed Form A before training and Form B after training.

A 20-min time limit was enforced here, which has been used previously in training studies (Jaeggi et al. 2008). The time restriction was used to avoid ceiling level performance and to keep the total testing time as short as possible for all assessments.

Additional Assessment Tasks

Participants also completed two additional tasks before and after training: a change detection task (Luck and Vogel 1997) and the Attentional Network Test (Fan et al. 2002). The task procedures and results are reported in the Supplemental Online Material. While the focus here was on near transfer and far transfer to Gf only, we report all measures in an effort to adhere to the methodological suggestions of Boot et al. (2011) for cognitive training studies.

Motivation and Engagement Measures

Participants provided self-report ratings of motivation and engagement for each task and also completed an exit survey at the end of the study, which assessed how well they liked training and whether they thought they improved on the training task and/or the assessment tasks. Additional description and results of these measures can be found in the Supplemental Online Material.

EEG Task and Procedures

Experimental stimuli were controlled by MATLAB (The MathWorks, Natick, MA) using Psychophysics Toolbox extensions (Brainard 1997; Pelli 1997), and displayed on a 19″ Dell LCD monitor. Participants were seated 92 cm away from the monitor, and given a Logitech game controller to enter responses.

Spatial Locations and Relations Task

This task was identical to the one described above in experiment 1 with the following exceptions. The fixation cross here appeared for 900 ms and the feedback display appeared for 100 ms. Also, each circle subtended 0.42° × 0.31° of visual angle. Memory array circles were presented between 1.5 and 1.8° of visual angle apart both horizontally and vertically, but again within the same quadrant of visual space. For non-match trials, the black test circle in Location trials fell between 1.1 × 1.5° and 1.4 × 1.8° of visual angle from the position of that center point in any direction between the sample circles. Finally, participants completed 88 practice trials during their initial pre-assessment visit without EEG being recorded in order to learn the task. Then during the EEG session, participants completed a total 512 trials.

EEG Data Acquisition and Analysis

EEG Recording

EEG data were recorded at 47 sites covering the whole scalp with approximately uniform density using an elastic electrode cap referenced to the average of all electrodes during recording (WaveGuard cap with 47-channel Duke, equidistant electrode placement, layout: Advanced Neuro Technology, The Netherlands). Electrode impedance was kept below 15 kΩ. All EEG electrodes were recorded continuously in DC mode at a sampling rate of 512 Hz using an anti-aliasing filter with a 138-Hz cutoff and a high-impedance ANT WaveGuard amplifier.

Preprocessing

Data were analyzed using the Fieldtrip software package (Oostenveld et al. 2011). Data were first high-pass filtered at 0.5 Hz, and then segmented into epochs covering the time from 2.0 s before to 5.5 s after the onset of the fixation display in each trial. Independent components analysis (ICA) was performed on the epoched data, and the eye blink component was identified and removed for each participant’s data. After eye blink correction, EEG waveforms from frontal electrodes (i.e., RE1/LE1) were visually inspected to identify voltage fluctuations (i.e., fluctuations greater than 18.75 μV or less than − 18.75 μV) typical of eye movements or gross motor movements. Ocular artifacts are characterized by high, > 15 μV amplitude, step-like potentials of opposite polarity. Here we used 18.75 μV as the cutoff to be slightly more conservative, as well as to be consistent with our groups’ previous EEG work with variants of this task. Our group has empirically learned that this is the most sensitive threshold for eye blink detection, based on multiple datasets using this electrode configuration and equipment. Trials containing these movement artifacts were rejected entirely. EEG data were analyzed only for correct trials.

To maintain sufficient statistical power for each session, participants with more than 40% trial rejection due to any combination of artifacts or behavioral errors were excluded (N = 11). Both pre- and post-training EEG data were excluded for those 11 participants. One additional participant’s pre-training EEG data was excluded because they failed to complete the post-training EEG session. The final sample of analyzed EEG data consisted of 2 sessions for 61 participants (DNBT: N = 20, SST: N = 22, and PRO: N = 19).

Spectral Analysis

Power spectra were calculated using a time-frequency transformation based on multiplication in the frequency domain from 1 to 30 Hz with 0.5 Hz increments using a hanning taper applied to short sliding time windows (Percival and Walden 1993) every 100 ms. An adaptive time window of five cycles for each frequency (ΔT = 5/f) was applied.

Statistical Analysis

To obtain statistics corrected for multiple comparisons we used nonparametric permutation tests (Maris and Oostenveld 2007; Nichols and Holmes 2002). This procedure controls for type I error by calculating the cluster-level statistics by randomizing trial labels at each iteration. First, spectral data from each of the 47 electrodes across the scalp were averaged over the time and frequency range of interest. Our frequency range of interest was alpha (8–13 Hz) based on previous work described above in the Introduction. Our time range of interest was the WM delay period (i.e., 1.9–3.9 s after the onset of fixation), but we excluded the first 500 ms of the delay period because this time period likely contained sensory-evoked response activity from the memory array (e.g., van Gerven et al. 2009; for a more detailed discussion of this topic, also see Bastiaansen et al. 2012). This timeframe left 1500 ms of delay period activity to analyze and given the need to have a minimum of ~ 10 cycles to achieve a reliable measure of power, we could not break this time period down any further when examining 8–13 Hz activity. Next, a t-value was calculated at each electrode. For each iteration randomizing trial labels, clusters of electrodes where the alpha-level was < 0.05 were identified, and their t-values were summed. The largest sum of t-values was used as a t-statistic. This procedure was repeated 5000 times to create the null distribution. The p-value was estimated according to the proportion of the null distributions exceeding the observed cluster-level t-statistic. It is convention with this analysis approach to report only the p-value for each cluster of electrodes.

Here our main analysis of interest was a group (DNBT, SST, PRO) × session (pre-, post-training) interaction. This was done by first calculating the post–pre difference and then testing for differences between pairs of groups (i.e., DNBT vs. SST, DNBT vs. PRO, SST vs. PRO). Fieldtrip does not allow for a three-level interaction permutation test; therefore, we imposed a Bonferroni multiple comparison correction for the three comparisons (i.e., interaction tests with a p-value less than 0.017 were considered statistically significant). Significant group × session interactions were followed up with post hoc contrasts by group and session.

Results

Behavioral Results

For each assessment task, we were interested in whether any group improved significantly more after training compared to before training compared to the other two groups. Therefore, we tested 3 (group: DNBT, SST, PRO) × 2 (session: pre, post) repeated-measures ANOVAs on the dependent measure for each assessment task. Our primary assessment measures were the object n-back task, Ospan, task switching, and the BOMAT and reliability for each of these measures can be found in Table 2.

Table 2

Test-retest reliability results for the transfer measures of interest

Test-retest reliability for near Transfer measures

Measure

Cronbach’s alpha

Object n-back: average d’

0.65

Ospan: partial score

0.88

Task Switching: switch cost

0.72

BOMAT

0.72

Object n-back

For the object n-back task, our dependent measure was the average d’ score across all five n-back levels. Any participant with performance > 2SD below the mean for either session was excluded from any analyses (N = 4). These exclusions resulted in a total of 69 participants remaining in the analysis. A main effect of session emerged, F(1,66) = 21.64, p < 0.001, partial η2 = 0.25, where participants improved significantly from pre- to post-training. The main effect of group also reached significance, F(2,66) = 3.87, p = 0.026, partial η2 = 0.11, with the DNBT showing the highest performance followed by SST and PRO, respectively. Critically, the group × session interaction was significant, F(2,66) = 5.28, p = 0.007, partial η2 = 0.14. Figure 6a shows the performance gains for all three training groups illustrating the largest gains for the DNBT group.

Near transfer results for each training group. a Significant near transfer to the object n-back task for the DNBT group. b The data for the OSpan task were in the expected direction with numerically greater gains for the SST group, but the results did not reach significance. c Significant near transfer to the task switching paradigm for the PRO group. Error bars represent standard error of the mean. *p < 0.05, †p = 0.1

Planned contrasts demonstrated that the DNBT group showed significantly higher performance after training compared to before training, t(22) = 5.82, p < 0.001, as did the SST group, t(23) = 2.29, p = 0.031, whereas the PRO group’s performance did not differ from pre- to post-training, t(21) = 0.50, p = 0.62. Moreover, the DNBT group showed significantly more improvement from pre- to post-training than both the SST group, t(45) = 2.02, p = 0.049, and the PRO group, t(43) = 3.34, p = 0.002. The SST group did not improve more so than the PRO group, t(44) = 1.24, p = 0.22. Together, these results reveal significant near transfer for the DNBT group to the object n-back task.

Ospan

The dependent measure was the Ospan score, which is the sum of correctly recalled letters in the correct order. Two participants did not complete the pre-training Ospan task. Following previous work, we excluded any participant with accuracy on the processing task < 75% for either session (N = 13). The high rate of participants with poor accuracy was driven by several participants not understanding the task during the pre-training session due to experimenter error. Further, any participant with memory performance > 2SD below the mean for either session was also excluded from any analyses (N = 3). These exclusions resulted in a total of 55 participants remaining in the analysis. Processing task accuracy for the remaining 55 participants was 93.4% (SD = 4.9). A main effect of session emerged, F(1,52) = 29.10, p < 0.001, partial η2 = 0.36, where participants improved significantly from pre- to post-training. The main effect of group was not significant, F(2,52) = 0.47, p = 0.63. The group × session interaction did not reach significance, F(2,52) = 0.75, p = 0.48. Figure 6b shows the performance gains for all three training groups, illustrating that there was no significant difference in pre-training performance between the three groups and that the two WM training groups, SST and DNBT, did show numerically greater gains compared to the PRO group.

Task Switching

The dependent measure for Task Switching was the switch cost RT, i.e., switch trial RT—repeat trial RT, which means that greater values correspond to a larger cost in RT for switching rules. RT was only analyzed for correct trials. Three participants did not complete the pre-training Task Switching measure. Participants with task accuracy < 75% for either session were excluded (N = 5) or with mean RT > 2SD above the mean were also excluded (N = 1). These exclusions resulted in a total of 64 participants remaining in the analysis. Task accuracy for the remaining 64 participants was 95.5% (SD = 3.6). A main effect of session emerged, F(1,61) = 5.90, p = 0.018, partial η2 = 0.09, where participants improved significantly from pre- to post-training, meaning they had smaller switch costs after training compared to before training. The main effect of group was not significant, F(2,61) = 0.64, p = 0.53. Critically, the group × session interaction was significant, F(2,61) = 3.23, p = 0.046, partial η2 = 0.10. Figure 6c shows the performance gains for all three training groups, illustrating the largest gains (i.e., a decrease in switch cost) for the PRO group.

While switch cost was the primary dependent measure of interest for the task switching task, we also tested a 3 (group: DNBT, SST, PRO) × 2 (session: pre, post) repeated-measures ANOVA on task accuracy to ensure the RT results could not be explained by a difference in accuracy or a speed-accuracy tradeoff. A main effect of session emerged, F(1,61) = 11.77, p < 0.001, partial η2 = 0.16, with accuracy being higher after training compared to before training (pre-training M = 96%, post-training M = 94%). Importantly, neither the main effect of group, F(2,61) = 0.95, p = 0.39, nor the group × session interaction, F(2,61) = 0.29, p = 0.75, approached significance, which suggests that the selective improvement of the PRO group on switch cost following training cannot be explained by a speed-accuracy tradeoff.

BOMAT

The dependent measure was the total number of correctly solved problems within the time limit. No outliers were present on this measure (i.e., no score was ± 2SD from the group mean). The main effect of session was significant, F(1,70) = 28.74, p < 0.001, partial η2 = 0.29, with performance being higher after training compared to before. Neither the main effect of group, F(2,70) = 0.09, p = 0.91, nor the group × session interaction, F(2,70) = 0.77, p = 0.47, approached significance. Figure 7 illustrates the group means by session.

BOMAT performance before and after training by group showing no evidence of significant transfer for any specific group. Error bars represent standard error of the mean

Predictors of Near Transfer

While previous studies have shown near transfer, it has also been demonstrated that there are individual differences in performance on training and near transfer (Jaeggi et al. 2014). Figure 8 illustrates group average session by session performance on the three training tasks. First, we were interested in whether performance on training was correlated with the degree of near transfer. To standardize each training groups’ performance, we calculated the maximum training level reached throughout the 20 sessions and then z-scored that value for each group. A correlation analysis (not corrected for multiple comparisons) showed that maximum training level showed a trend toward being positively correlated with improvement on the object n-back task, for the DNBT group, R(47) = 0.29, p = 0.05, but not for the SST group, R(24) = 0.24, p = 0.25,or PRO group, R(22) = 0.07, p = 0.76. Figure 9a shows that better performance during training for the WM groups was associated with greater gains on the object n-back task. Further, we found that baseline performance (i.e., pre-training) on the object n-back task was related to improvement on that task following training, for our WM groups, but not the PRO group. To examine this, we performed a median split on pre-training performance on the object n-back task and then compared gain scores (post–pre) within each group (see Fig. 9b). For the DNBT and SST groups, the low WM participants saw greater gains after training compared to high WM participants, DNBT: t(21) = 3.01, p = 0.007, SST: t(22) = 3.51, p = 0.002, but this difference was not significant for the PRO group, t(20) = 1.52, p = 0.14, showing that this effect was specific to individuals who trained on a WM task.

Group average training performance for each group. Data is shown as the average level achieved per session. For the DNBT group, data is shown by n-level. For the SST group, every third level added an additional memory item, whereas in between levels added symmetry judgments in between memory items. For the PRO group, each level required faster encoding of the rules and then every fourth level added in a new rule, then starting with level 31 the amount of time to respond decreased. Error bars represent standard error of the mean

Predictors of near transfer to the object n-back task. a Scatterplot for each training group illustrating the relationship between standardized maximum training level reached and improvement on the object n-back task. b Baseline performance on the object n-back task was associated with greater gains after training on that same task for the WM groups but not the PRO group. Error bars represent standard error of the mean. *p < 0.05

Behavioral Results During EEG

Similar to the other transfer tasks described above, we were interested in whether any group improved significantly more after training compared to the other two groups on the Locations/Relations WM task performed during EEG recording. We analyzed the behavioral data from the 61 participants who completed the study and had sufficient EEG data for analysis (see details above in the “Method”).4 First, we tested a 3 (group: DNBT, SST, PRO) × 2 (session: pre, post) × 2 (load: low, high) × 2 (trial type: Location, Relation) repeated-measures ANOVA on accuracy. Significant main effects of load, F(1,58) = 639.07, p < 0.001, partial η2 = 0.92, and trial type emerged, F(1,58) = 136.05, p < 0.001, partial η2 = 0.70, with accuracy being higher on low load and Relation trials, respectively. The session × trial type interaction approached significance, F(1,58) = 3.90, p = 0.05, partial η2 = 0.06, with greater improvement on the Relation trials after training compared to the Location trials. Further, the load × trial type interaction was also significant, F(1,58) = 264.13, p < 0.001, partial η2 = 0.82, with higher load decreasing accuracy to a greater degree for Location trials compared to Relation trials. All other main effects and interactions did not reach significance, all Fs ≤ 1.80, all ps ≥ 0.17. Given the lack of a significant interaction including the session and group factors, these results suggest no significant transfer to this WM task (see Fig. 10).

Behavioral accuracy data for the Relation and Location WM task that participants completed while EEG was recorded. Error bars represent standard error of the mean

EEG Results

To examine whether changes in neural activity occurred in response to training, we tested a group × session interaction on the delay period alpha (8–13 Hz) power separately for Location and Relation trials. Here we only present analyses from high load trials, but group × session interaction results for low load trials are detailed in the Supplemental Online Material. For high load Location trials (see Fig. 11a), there was a significant interaction in a group of frontal electrodes when comparing the DNBT and SST groups, p = 0.013, and also when comparing the DNBT and PRO groups, p = 0.009, which was in a similar group of frontal electrodes. However, the group × session interaction comparing SST and PRO groups was not significant, p > 0.05. To follow up on the significant interactions, we examined spectral power differences within each testing session as well as within each group. Before training, there was no significant difference in alpha power between the DNBT group and either the SST or PRO groups. However, after training, the DNBT group showed significantly more alpha power in a group of frontal electrodes compared to the SST group, p < 0.05, and in a cluster of frontal electrodes and a separate cluster of posterior electrodes when compared to the PRO group, ps < 0.05. We also examined whether each group showed significant changes in alpha power by comparing the post- to pre-training alpha power. While the PRO and SST groups showed no significant change in alpha power, the DNBT showed significantly more alpha power after training compared to before training, p < 0.005, in a group of frontal electrodes.

Nonparametric permutation test results for delay period alpha power (8-13 Hz). Results are shown for group x session interactions and post-hoc contrasts separately for Location (a) and Relation (b) trials. See online version for color figures

For high load Relation trials (see Fig. 11b), there was a significant interaction in a group of left frontal electrodes when comparing the DNBT and PRO groups, p = 0.017, but not when comparing the DNBT and SST groups, p > 0.05. Also, the group × session interaction comparing SST and PRO groups was not significant, p > 0.05. To follow up on the significant interaction, we examined spectral power differences within each testing session as well as within each group. Before training, there was no significant difference in alpha power between the DNBT group and either the SST or PRO groups. However, after training, the DNBT group showed significantly more alpha power in a group of right frontal electrodes compared to the SST group, p < 0.05, and in a cluster of frontal electrodes and a separate cluster of posterior electrodes when compared to the PRO group, ps < 0.05. We also examined whether each group showed significant changes in alpha power from to pre-training to post-training. While the PRO and SST groups showed no significant change in alpha power, the DNBT group showed significantly more alpha power after training compared to before training, p < 0.05, in a group of left frontal electrodes.

Baseline Correlations

We found no evidence in our training data that DNBT has a differential effect on relational WM. However, in experiment 2 we have a large baseline data set that could be used to attempt to replicate the initial correlational findings from experiment 1. We therefore tested the same partial correlations as in experiment 1, here with our baseline sample of 119 participants who completed both pre-training sessions (i.e., assessments and EEG). The partial correlation between Relation WM and pre-training object n-back performance while controlling for Location WM did not reach significance, p = 0.29. Likewise, there was no relationship between Location WM and n-back performance when controlling for Relation WM, p = 0.81. Similarly, there was no significant relationship between Relation WM and OSpan or BOMAT when controlling for Location WM, p = 0.16 and p = 0.64, respectively. There was also no significant relationships between Location WM and OSpan or BOMAT when controlling for Relation WM, p = 0.63 and p = 0.61, respectively.

Discussion

Experiment 2 demonstrated clear near transfer to the object n-back task for both of our WM training groups, as well as robust near transfer to a task switching paradigm for our active control group. Moreover, near transfer for our WM training groups was predicted by baseline WM level, whereby individuals with lower baseline WM improved more after training compared to their high WM counterparts. While each group improved on the BOMAT, we found no specific evidence of far transfer to Gf for any one training group. In fact, the PRO group showed the largest numerical gains on the BOMAT of the three groups. Finally, we found that the DNBT group showed significant increases in frontal alpha power compared to the other two training groups. While we did not find support for our hypothesis of specific improvement in relational WM driving the DNBT group’s transfer, these results are consistent with the idea that DNBT may be a more effective training task as compared to complex span tasks. Further implications are discussed below.

General Discussion

We initially hypothesized that a critical difference between n-back and complex span WM tasks is their reliance on the ability to extract and maintain relational information in WM. We predicted that this difference might account for differences in training efficacy seen in the literature, whereby n-back tasks seem to produce more effective far transfer. In experiment 1, we found correlational evidence in support of this hypothesis whereby performance on a relational WM task predicted performance on an n-back task and a measure of Gf, but not a complex span task. We followed up on this finding with a training study in experiment 2, which is the first study to directly compare n-back and complex span WM training in the same study. While we found strong near transfer effects, we saw no evidence of far transfer to Gf. We did not find evidence to support the notion that the effectiveness of DNBT is due to a dependence on relational WM. Further, we failed to replicate our findings from experiment 1 of a relationship between relational WM and n-back performance in the larger sample in experiment 2. While we ultimately did find support for our initial hypothesis about the role of relational WM in n-back training, our results demonstrated that DNBT did elicit robust near transfer and significantly greater neural changes compared to the other two training groups.

Our finding of near transfer but not far transfer, even with the DNBT group, is perhaps not surprising. There have been inconsistent results in the literature surrounding far transfer, but there have been far fewer inconsistencies with respect to near transfer, i.e., training on one WM task and finding evidence of improvement on a separate, untrained WM task. For example, training on one type of n-back task transfers to improvements on untrained versions of the n-back task (Colom et al. 2013; Jaeggi et al. 2010a, b; Li et al. 2008). Similarly, training on one type of complex span WM task transfers to improvements on other complex span tasks (Chein and Morrison 2010; Richmond et al. 2011). Consistent with this previous work, here we found that DNBT resulted in robust near transfer to an untrained n-back task. Moreover, the SST group also showed significant improvement on the object n-back task, but not significantly more so than the active control group. Importantly, we found clear near transfer from our active control group to a task switching paradigm, which suggests that this group was indeed an adequate control and that our WM training groups’ transfer was specific to WM tasks. Surprisingly, we did not see significant near transfer from the SST or DNBT groups to the Ospan task even though the results were in the expected direction. There are at least two potential reasons for this null result. First, we had a particularly high number of outliers on the Ospan task as noted above in experiment 2, which may have underpowered the critical group x session interaction. Second, SST and Ospan cross-modality boundaries, with symmetry span being visuospatial and Ospan involving verbal, letter stimuli. This fits with previous evidence that intra- and across-modal training effects are separable (Schneiders et al. 2011), but on the other hand, there is evidence of cross-modality transfer (Buschkuehl et al. 2014; Jaeggi et al. 2014). It is possible that if we had used another visuospatial complex span task, we might have seen more robust near transfer; however, an important direction for future training studies will be to elucidate the role of stimulus modality in transfer.

Despite the near transfer pattern seen in experiment 2, we found no evidence of far transfer to BOMAT performance. All three training groups showed significant improvement on the BOMAT after training compared to before training, but no one group showed significantly more improvement, which suggests that these increases may simply be practice effects. While we predicted far transfer a priori, our results are consistent with previous failures to find far transfer to Gf following WM training. Interestingly, two prominent examples of failure to find far transfer also failed to find any improvement from pre- to post-training on measures of Gf following training (Redick et al. 2013; Thompson et al. 2013), which is unusual given the expectation of practice effects. Here we did observe significant improvement on the BOMAT from pre- to post-training, but that pattern was present for all three training groups. In light of this observation, one possible explanation for our results may be our choice of active control group. The PRO training involved rapid rule learning and the ability to quickly encode and switch between rules. It may be possible that these active ingredients helped participants to perform better on the BOMAT after training by increasing speed of processing and/or cognitive flexibility. This idea is speculative, but future work should address the effectiveness of PRO training in eliciting transfer effects. Furthermore, the choice of appropriate active control groups is an ongoing issue that will play a key role in moving the cognitive training field forward.

In experiment 1, we found evidence that n-back performance was associated with relational WM performance supporting our initial hypothesis. However, in experiment 2, neither the behavioral nor the EEG data demonstrated support for a specific role of relational WM in DNBT performance effects. And we also did not observe the same relationship between relational WM and n-back performance in our larger sample in experiment 2. Here we sought to test out the notion that DNBT may exert different training effects than complex span because of a greater reliance on relational WM, as was initially suggested by the results of experiment 1. This hypothesis was not supported, however, by the results of experiment 2. It may be the case that DNBT does utilize relational WM, but additional factors also contribute to its training effectiveness, which diluted the transfer to the relation WM task. An alternative approach would be to specifically design a training paradigm that improves relational WM more directly. Future studies could test the effectiveness and transfer effects of such a training paradigm. Nonetheless, our EEG data did support the original motivation for our hypothesis, which was that DNBT seems to be more effective than other training approaches. Our results showed that only DNBT produced significant changes in alpha power, with significant increases in delay period frontal alpha power after training compared to before training for both Location and Relation trials. Neither the SST nor the PRO group showed any significant differences from pre- to post-training. Alpha activity over frontal brain regions have been interpreted as representing top-down modulation of WM (Zanto et al. 2011), which may be the mechanism by which DNBT is exerting its effectiveness; however, future studies will be needed to directly test this idea. Further, this lack of support for the role of relational WM in transfer, would benefit from future studies examining other possible factors that underlie WM training and transfer effectiveness. A recent example of such an effort provides evidence for the role distractor filtering in improving WM capacity via training (Li et al. 2017).

Our findings with frontal alpha are consistent with the idea that training is influencing the frontoparietal network, which has also been shown in two previous studies using EEG. Jausovec and Jausovec (2012) found that WM training, on a diverse set of WM tasks, resulted in changes in alpha and theta event-related synchronization (ERS) when participants performed transfer tasks. However, in their study the EEG data were averaged across multiple transfer tasks that included measures of Gf, verbal analogy, and spatial rotation. This approach makes it difficult to pinpoint the changes in oscillatory synchronization to a specific aspect of training or transfer. Another study by Kundu et al. (2013) used EEG and TMS to demonstrate that transfer following WM training was supported by changes in task-related effective connectivity in frontoparietal and parieto-occipital networks that were engaged by the training and transfer tasks. While very few studies have examined changes in EEG measures due to training, there is a small body of work using fMRI that also suggests that training is associated with functional changes that have been localized to posterior parietal and prefrontal cortices (Beatty et al. 2015; Buschkuehl et al. 2014; Salminen et al. 2016; for an fNIRS study see, McKendrick et al. 2014; Schneiders et al. 2011; Thompson et al. 2016; Vartanian et al. 2013). While our results are consistent with this notion of frontoparietal involvement in dual n-back training effects, it is worth noting that our frontal alpha results are present in slightly more anterior clusters of electrodes than are typically seen. Given the nature of EEG, it is not possible to discern whether our results have distinct neural sources from previous work, but it is an interesting idea that may be relevant for future investigations working in source-space.

One interesting facet of our results is the lack of behavioral improvement on the Location and Relation WM tasks. The significant increase in frontal alpha power for the DNBT group was present despite no significant performance improvement on the tasks performed during the EEG recording. While these results may seem at odds with one another, the lack of behavioral change is also a benefit in interpreting the EEG changes. This suggests that our results are not simply driven by differences in behavior. At least one previous study has also found similar dissociations between changes in neural activity following training in the absence of changes in behavior (Vartanian et al. 2013). Somewhat surprisingly, we did not find any association between the changes in alpha power and the behavioral near transfer measures. In other words, there was no correlation between transfer to the object n-back task and changes in frontal alpha power during location or relation WM for the DNBT group. An important future direction will be to measure EEG activity while participants perform near transfer tasks to potentially pinpoint the mechanism underlying near transfer.

One additional limitation of the current study was the use of individual tasks as transfer measures. There has been a push in the field to examine transfer to construct level variables opposed to individual tests (Chooi and Thompson 2012; Colom et al. 2013; Shipstead et al. 2012). It remains difficult to assess constructs with multiple measures and keep the testing sessions at a reasonable length. However, a critical future direction for the WM training field will be to systematically test for transfer to construct level variables and simultaneously uncover the potential neural basis of that transfer.

In summary, we sought to find what distinguishes n-back and complex span WM training tasks and how the two types of training compare with respect to near and far transfer. Our original hypothesis that relational WM was the critical factor differentiating these two forms of WM training was ultimately not supported. However, we found clear evidence that the effects of these two types of training are indeed distinct. In addition to improving behavioral performance on a different n-back task, DNBT was found to elicit significant changes in frontal alpha power during two other substantially different WM tasks. SST and our active control groups did not demonstrate any training-related changes in EEG measures. Thus, DNBT training appears to be a promising method for improving WM, but future work is needed to test alternative hypotheses about what “active ingredients” of the n-back task make it a more effective training tool in order to further improve training protocols.

Footnotes

The Symmetry Span task traditionally yields two scores: “partial” and “absolute.” Here we focused on partial scores as these have been shown to have higher internal consistency than absolute scores (e.g., Conway et al. 2005).

Variation in payment arose from participants receiving a completion bonus of $15 upon completing all study sessions and maintaining above chance level performance during training (e.g., two consecutive sessions of average performance below chance for each training task resulted in the participant not receiving the bonus). Participants were aware of the contingencies of this bonus.

Our adaptivity criteria for the DNBT is a departure from that used by Jaeggi et al. (2008) and many subsequent studies using this paradigm for training. In Jaeggi et al. (2008), with 12 targets per block (same as used here), if participants made fewer than three mistakes per modality they advanced to the next n-level. Thus, accuracy was considered separately for each modality. Here, we considered accuracy across both modalities for our cutoff values. In the present study, participants had to perform above 85% on both modalities considered together to move up an n-level. This more stringent criteria likely explains why our participants on average did not advance as far up the n-levels as some previous studies (see Fig. 8) as it would have been more difficult to advance up a level and easier to fall back down a level with this criteria.