Each subject is given a specific amount of time to study List A -- the list of words.

2.

After a timed delay, each subject is given a specific amount of time to write down all the words the subject can recall from the list.

3.

The subject is scored based on the number of words correctly recalled.

Repeat this process using List B -- the non-words.

Because you want to use the results to make a comparison, you want the lists to be comparable. The two lists should contain the same number of character strings, and the character strings should be the same size. One way to do this is for both lists to contain "words" that are all the same length (say, four letters each); another is for both lists to contain "words" that are respectively the same lengths (say, the first "word" on each list is six letters, the next "words" are three letters, and so on).

Yes, the process may be biased in the way the "non-words" are created -- the person creating non-words may deliberately choose letter combinations that are exceptionally difficult to remember. Also, someone asked to perform the task twice, with List A and then List B, may perform better with List B simply because he or she has practiced the task by using List A first.

A major weakness is the selection of the groups; volunteers may be particularly eager to test their memories. Another weakness is that no participant takes both tests, so there is no possibility of directly comparing the results of the two lists for the same participant.

c.

The selection of the participants is a large source of bias and is not addressed by the design.

d.

A better design would randomly assign participants to each group and would take measurements for each list for all 16 participants.

Design 2

a.

An equal number of participants uses each list, and all participants take both tests, allowing for more direct comparison.

b.

A major weakness is that all participants use List A first. By using List B second, they may perform better (or worse) simply due to their prior experience from the first test, and not due to actual differences in the tests themselves.

c.

A potential source of bias is the possibility that List B becomes easier or harder as a result of List A being used first.

d.

A better design would randomly select half the participants to use List A first and half to use List B first.

Design 3

a.

Each group has an equal number of participants, and participants do not determine their own groups.

b.

A major weakness is the selection of the groups. The person conducting the study may deliberately place certain types of people in a group, either to deliberately skew the results or due to unconscious bias. Another weakness is that no participant takes both tests, so there is no possibility of directly comparing the results of the two lists for the same participant.

c.

The fact that the person conducting the study selects the groups is an unaddressed source of bias.

d.

A better design would randomly assign participants to each group and would take measurements for each list for all 16 participants.

Design 4

a.

All participants use both lists, and an equal number of participants uses each list.

b.

The major weakness is the selection of the groups. The person conducting the study may deliberately place certain types of people in a group, either to deliberately skew the results or because of unconscious bias.

c.

The fact that the person conducting the study selects the groups is an unaddressed source of bias.

d.

A better design would randomly assign participants to each group.

Design 5

a.

Groups are randomly assigned; all participants use both lists; and an equal number of participants uses each list first.

b.

There do not appear to be any major weaknesses.

c.

There are no major sources of bias.

d.

A better design might include more participants to increase the relevance and confidence of the findings.

Design 5 does the best job of removing bias. There are still small possible sources of bias, the most apparent of which is the method of creating the two lists. A specific way of randomly generating List B would be useful, as might a specific method of finding the words used for List A.

There are many ways to randomly assign the 16 subjects to groups. One way is to place the 16 names on equal-size slips of paper, then draw eight of these slips from a hat. Another is to start with a list of last names of the 16 participants arranged in alphabetical order, then flip a coin for each individual. If the coin lands heads, the participant is assigned to Group 1, and if the coin lands tails, the participant is assigned to Group 2. This continues until eight subjects are assigned to a group; the remaining subjects are then assigned to the other group.

b.

If different groups read each list, variation in the data may come from randomly picking only the best people to read one list. If each person reads both lists, this potential source of variation and bias is removed completely.

Note that all measures of location (Min, Q1, Med, Q3, and Max) are higher for List A (Words) than for List B (Non-Words). The median for List A is twice the median for List B. In other words, people typically remembered twice as many "words" from List A as from List B. However, there is more variation in the number recalled correctly for List A than for List B. The range for List A is 12 (from 3 to 15), while the range for List B is 8 (from 1 to 9). The interquartile ranges for the two lists (3 and 2) are roughly equal.

One telling statistic is that the median of List A (10) is higher than the maximum of List B (9). This means that more than 50% of people scored higher on List A than anyone scored on List B.

Since only one person had better recall using the list of non-words, and 15 others had better recall using the list of words, this suggests that words are significantly easier to recall than non-words.

c.

Here is the Five-Number Summary of the differences:

Min

Q1

Med

Q3

Max

-2

3

5.5

8

9

d.

Here is the box plot of the differences:

e.

These results indicate that people are better at recalling words than non-words. Note that the entire interquartile range (the "box") is above the axis, which indicates that all of the center 50% of participants performed better with the list of words.