A 2 x 2 ANOVA was computed with the number of correct answers on the immediate written vocabulary recognition posttest as the dependent measure, and the presence or absence of pictorial and written annotations as the between subjects factor (Table 3).

There was a significant main effect for written annotations and for pictorial annotations, and a significant interaction effect between the two. The pictorial and written annotations group, and the written annotations group performed best while the control group performed the poorest (Table 4).

Tukey HSD showed that all annotation groups performed significantly better than the control group (p<0.001), but did not differ significantly from each other.

Delayed Written Vocabulary Recognition Posttest, Study 1

A 2 x 2 ANOVA was computed with the number of correct answers on the delayed written vocabulary recognition posttest as the dependent measure and the presence or absence of pictorial and written annotations as the between subjects factor (Table 5).

There was a significant main effect for written and for pictorial annotations, and a significant interaction effect between pictorial and written annotations. Mean group scores showed that the pictorial and written annotations group performed the best, while the control group performed the poorest (Table 6).

A Tukey HSD test showed that the annotation groups had significantly higher scores than the control group (p<0.001). There were no statistically significant differences between the treatment groups.

Immediate Pictorial Vocabulary Recognition Posttest, Study 1

A 2 x 2 ANOVA was computed with the number of correct answers on the immediate pictorial vocabulary recognition posttest as the dependent measure and the presence or absence of pictorial and written annotations as the between subjects factor (Table 7).

There was a significant main effect for pictorial and for written annotations, and a significant interaction effect between the two. The pictorial and written annotations group and the pictorial annotations group performed the best while the control group performed the poorest (Table 8).

Post hoc comparisons (Tukey HSD) of the posttest scores showed that all annotation groups had significantly higher scores than the control group (p<0.001), but that there were no statistically significant differences between the annotation groups.

Delayed Pictorial Vocabulary Recognition Posttest, Study 1

A 2 x 2 ANOVA was computed with the number of correct answers on the delayed pictorial vocabulary recognition posttest as the dependent measure and the presence or absence of pictorial and written annotations as the between subjects factor (Table 9).

There was a significant main effect for pictorial and for written annotations, and a significant interaction effect between the two. Mean scores showed that the pictorial annotations group performed the best while the control group performed the poorest (Table 10).

Post hoc comparisons (Tukey HSD) showed that all annotation groups outperformed the control group (p<0.01), but did not differ significantly from each other.

Students in the pictorial and written annotations group accessed the two annotation types with comparable frequency: 53% of the time with an average of 7.60 seconds per annotation for pictorial, and; 47% of the time with an average of 8.1 seconds per annotation for written annotations. The single annotation groups viewed their respective annotations for equal amounts of time: 11.35 seconds for pictorial, and 11.51 seconds for written annotations.

In summary, all annotation groups performed significantly better than the control group on all tests. No other significant differences were found.

A 2 x 2 ANOVA was computed with the number of correct answers on the immediate written vocabulary production posttest as the dependent measure, and the presence or absence of pictorial and written annotations as the between subjects factor (Table 11).

Table 11. ANOVA for the Immediate Written Vocabulary Production Posttest, Study 2

Factors

F

MSE

p

n2

Written Annotations

93.6 (1,63)

1874.34

<0.001

.598

Pictorial Annotations

4.00 (1,63)

80.17

<0.05

.06

Written and Pictorial Annotations

5.07 (1,63)

101.49

<0.05

.074

There was a significant main effect for written and pictorial annotations and a significant interaction between the two types of annotations. The pictorial and written annotations and the written annotations groups performed the best while the control group performed the poorest (Table 12).

Table 12. Mean Scores of the Four Groups on the Immediate Written Vocabulary Production Posttest, Study 2

Groups

N

M

SD

Control

16

3.31

1.66

Pictorial Annotations

17

8.47

3.48

Written Annotations

18

16.33

5.29

Pictorial and Written Annotations

16

16.56

6.05

Post hoc comparisons (Tukey HSD) showed that the written annotations group did not differ significantly from those with access to both annotations, but that the written annotations and the pictorial and written annotations groups performed significantly better than did the pictorial annotations group, (p<0.001). All annotation groups performed significantly better than the control group.

Delayed Written Vocabulary Production Posttest, Study 2

A 2 x 2 ANOVA was computed with the number of correct answers on the delayed written vocabulary production posttest as the dependent measure and the presence or absence of pictorial and written annotations as the between subjects factor (Table 13).

Table 13. ANOVA for the Delayed Written Vocabulary Production Posttest, Study 2

Factors

F

MSE

p

n2

Written Annotations

40.42 (1,47)

367.4

<0.001

.462

Pictorial Annotations

0.096 (1,47)

0.872

<0.758

.002

Written and Pictorial Annotations

6.8 (1,47)

61.83

<0.050

.126

There was a significant main effect for written annotations and significant interaction effect between pictorial and written annotations. The written annotations group performed the best while the control group performed the poorest (Table 14).

Table 14. Mean Group Scores on the Delayed Written Vocabulary Production Posttest, Study 2

Groups

N

M

SD

Control

13

2.77

1.09

Pictorial Annotations

14

5.43

2.17

Written Annotations

13

10.31

4.03

Pictorial and Written Annotations

11

8.55

3.96

Post hoc comparisons (Tukey HSD) showed that the written annotations group did not differ significantly from those with access to both annotation types. The written annotations group and the pictorial and written annotations group had significantly higher scores than the control group (p<0.001). There was no significant difference between the pictorial annotations group and the written and pictorial annotations group. The difference between the pictorial annotations group and the control group was also not significant.

In terms of time on task, students in the pictorial and written annotations group did not access both annotation types with equal frequency: Pictorial annotations were accessed 37% of the time with an average of 7.01 seconds per annotation; written annotations were accessed 63% of the time with an average of 8.23 seconds per annotation. However, both annotation types were viewed for almost equal amounts of time by the single annotation groups: 10.98 seconds for pictorial and 11.21 seconds for written annotations.

In summary, the control group performed the poorest on both posttests. On the immediate written vocabulary production posttest, subjects who accessed both annotation types or written annotations alone outperformed those without access to written annotations. On the delayed test, the written annotations group retained more vocabulary than all other groups, while the pictorial annotations group did not differ significantly from the control group. Those who had access to written annotations alone or combined with pictorial annotations significantly outperformed those who did not have access to any written annotations.

DISCUSSION

Hypotheses 1 and 2 predicted that students with access to pictorial and written annotations during a L2 listening comprehension activity would recognize more written translations and pictorial representations of keywords on written vocabulary and pictorial vocabulary recognition posttests. These two hypotheses further predicted that students who accessed written annotations would outperform those without access to such annotations on the written vocabulary recognition posttest, while students who accessed pictorial annotations would outperform those without access to such annotations on the pictorial vocabulary recognition posttest.

The results of the immediate vocabulary recognition tests did not support these hypotheses because students recognized vocabulary equally well, regardless of test mode. Within recognition tests, there is an inherent ability to guess built into the testing format. Thus, previous exposure to the translation, either visually or verbally, makes selecting the correct response much easier than if one is asked to produce a response from memory (Cariana & Lee, 2001; Glover, 1989; McDaniel & Mason, 1985).

Hypothesis 3 predicted that students with access to pictorial and written annotations during a L2 listening comprehension activity would recall more vocabulary on a written vocabulary posttest than those without access to both annotation types, and also that students who accessed written annotations would outperform those without access to such annotations. Results of the immediate vocabulary production test show that the pictorial and written annotations group and the written annotations group recalled more vocabulary than did those without access to written annotations. This is in line with the third hypothesis and demonstrates that students learned more vocabulary when the testing mode employed matched the mode accessed, either alone or combined with an additional annotation mode.

With regards to all three hypotheses, the control group performed the poorest because the difficulty of the aural text prevented students from building contextual knowledge, thus lessening their ability to learn vocabulary incidentally (Hulstijn, 1992; Jones, 2003; Jones & Plass, 2002). On the other hand, vocabulary acquisition was consistently strong when students had access to pictorial and written annotations, thus supporting a multimedia effect proposed by Mayer (2001). The ability to look up words more than once in different modalities supported inferencing and verification strategies (Grace, 1998) and, reinforced learning (Chun & Plass, 1996), so that students were able to perform well on immediate tests regardless of testing mode. Additionally, students could establish direct connections between the L1 and L2 vocabulary and the corresponding images and thereby have two instead of just one retrieval route (Plass et al., 1998). However, with regard to the third hypothesis, students in the pictorial and written annotations group may have had too much information to look up and may have foregone examining both annotation types (Jones, 1995). Tracking logs showed that the pictorial and written annotations group did not examine the two types of annotations in a balanced manner, and this group subsequently performed poorer on the delayed written production test compared to the written annotations group. Though this group initially obtained a richer and redundant amount of information that was immediately helpful for producing written translations, with time, the retained information may have become "cluttered" and inhibited the students' ability to focus directly on the needed responses due to cognitive overload (Sweller, 1994).

Some researchers have argued that images carry a structural message that complements the language presented (Baggett, 1989; Kozma, 1991) and that the pictorial mode facilitates vocabulary learning (Kellogg & Howe, 1971; Oxford & Crookall, 1990; Underwood, 1989). This was the case in the study conducted by Jones and Plass (2002) in which students who accessed pictorial annotations alone or combined with written annotations outperformed those without access to any pictorial annotations on a written vocabulary recognition posttest. In the present study, students performed well no matter which annotation type was accessed. However, the pictorial annotations group could not produce vocabulary from memory as well as those groups that had access to written annotations, a result counter to findings that the pictorial mode of information increases the efficiency of learning (Kost, Foss & Lenzini, 1999; Oxford & Crookall, 1990; Terrell, 1986). Instead, images may have provided too much information (Sweller, 1994) rather than the more precise information provided by direct translations.

There are more connections in the memory representation when the input is visual. "Brown leaf" presented verbally creates the instance of "leaf" connected with the concept "brown." But showing a picture of a brown leaf causes one to create the concept of leaf connected with concepts of brown, olive, rust, burgundy, etc., not to mention its shape, size, environment, etc. In the verbal presentation there is one sure connection: leaf with brown. (Baggett, 1989, p. 119)

The richness of images may have affected students' ability to accurately translate L2 words into L1, while written annotations provided precise definitions of the L2 words.