This
report appeared in the Journal of the Society for Psychical Research,
Volume 66.3, Number 868, July 2002, and is included on this
website with the kind permission of Gary
Schwartz.

Accuracy
and Replicability of Anomalous Information Retrieval: Replication and
Extension

- Gary E. R. Schwartz, Linda G. S. Russek
and Christopher Barentsen -

-
Abstract -

The study investigated the ability of three research mediums to obtain information regarding the deceased loved ones of five research 'sitters' (subjects). The mediums were kept completely blind to the identity of the sitters. The mediums sat behind a floor-to-ceiling screen, with their backs to the screen, facing video cameras. The mediums were not allowed to ask any questions, and the sitters never spoke. Transcripts were made from the recordings. The sitters scored all initials, names, historical facts, personal descriptions, and temperament descriptions (n = 528 items for 15 readings) using a -3 (definite miss) to +3 (definite hit) rating scale. When the sitters rated their own readings, the average percentage of +3 scores was 40%. When the sitters rated the readings of the other sitters (control readings), the value was 25% (p < 0.03). The findings appear to confirm the hypothesis that information and energy, and potentially consciousness itself, can continue after physical death.

Introduction

The experiment reported in this paper addresses the question of replicability of anomalous information retrieval in highly skilled mediums. Its methods and findings have important implications for consciousness studies, parapsychology, and the possibility of the continuance of consciousness after physical death.

The present experiment replicated and extended Schwartz, Russek, Nelson and Barentsen (2001), which was a multi-medium, multi-sitter single-blind design with Part I, the sitter-silent condition, being conducted single-blind. In that experiment the mediums never saw the sitters or heard them speak; they were blind to the identity of the sitters, and received no visual or verbal feedback during the sitter-silent condition. Non-verbal cues were occasionally present, e.g. sighs, coughs, chair movements. The present study replicated this experiment, adding a full-length, double-sheeted floor-to-ceiling screen to rule out any possibility of visual cues.

However, as in Schwartz et al. (2001), the sitters were not blind to the mediums; they were present at their respective readings and heard the information as it was reported by the mediums. The question of whether a possible bias by the sitters when rating their own readings could serve as an explanation for the totality of the quantitive findings is addressed in the discussion section of the present paper, which also includes some qualitative data illustrating why such an explanation is inadequate.

The primary hypothesis was that sitters would rate their own readings as containing significantly more accurate information, and significantly less inaccurate information, than control readings, and that this effect would be replicated across mediums. We predicted that this effect would be observed not only for individual ratings of items, but sequences of items (termed information packets) as well.

Exploratory data analyses were also reported to stimulate novel questions for future research.

METHOD

Subjects

There were three mediums: Laurie Campbell, John Edward, and Suzane Northrop. They had collaborated in two previous multi-medium experiments in the Human Energy Systems Laboratory (Schwartz et al., 2001). In addition, Campbell has collaborated in single (e.g. Schwartz, Russek, Watson, Campbell & Smith, 1999; Schwartz & Russek, 1999) and double-medium (e.g. Schwartz & Russek, 2001) studies in the laboratory as well. These three individuals are referred to as 'research mediums' in that they are willing to participate in studies that require substantial experimental control. No evidence of fraud or cold reading has been observed with these select mediums.

At the time this paper went to press, all three mediums were professional, and one (John Edward) had a highly visible television show
(Crossing Over with John Edward). However, at the time this research was conducted, Campbell had not yet begun her career as a professional medium, and John Edward did not have a television show.

There were five sitters (all female, ranging in age from twenties to fifties). All had two or more deceased loved ones with strong emotional bonds. The identities of the sitters were kept secret from the mediums (described in more detail below). Moreover, the experimenters were kept blind to most of the specific information about the deceased loved ones of the five research sitters until after the data had been collected and the scoring of the transcripts had been completed by the sitters.

Again, these individuals are regarded as 'research sitters', in that they were willing to perform careful and time-consuming scoring of the transcripts. All had undergraduate degrees. Two had masters degrees and worked as research coordinators at the University of Arizona. They understood that honesty and integrity were critical in this research. They were all willing to have their names released and be available for interviews with scientists and skeptics as well as the media (Schwartz with Simon, 2002).

DESIGN

The single-blind research design consisted of two parts:

(1) Sitter-Silent Condition

During this ten-minute period, the mediums were requested to speak out loud whatever information they received about the deceased loved ones of the sitters. The mediums were not allowed to ask any questions of the sitters. The mediums were tested simultaneously, in separate rooms, facing video cameras and back-up tape recorders; they sat with their backs to a floor-to-ceiling screen that separated them from the sitters and experimenters. The mediums could not see or hear the sitters. Hence, the mediums were blind to age, sex, appearance, non-verbal visual cues, and verbal cues.

Because the sitter-silent condition provides no verbal/semanatic feedback to the mediums as well as minimal non-verbal feedback (save for possible sighs or breathing information from the sitters), the sitter-silent condition eliminates the plausibility of 'cold reading' as a probable explanation for the findings. For this reason, the paper reports the data from the sitter-silent condition. These form the most compelling evidence for anomalous information retrieval.

(2) Questioning Condition

During this ten-minute period, the mediums were allowed to ask simple yes/no questions of the sitters. However, the sitters were still not allowed to speak. Instead, the sitters nodded their heads yes or no, and the experimenters spoke
yes or no. Hence, the only voices the mediums heard during the course of the experiment were the voices of the experimenters.

Schwartz was the experimenter with Edward, Russek was the experimenter with Campbell, and Barentsen was the experimenter with Northrop. Hence, a given medium heard only the voice of one experimenter during the data collection.

Sitters were sequestered in a separate room. They were escorted to a given room by the appropriate experimenter. Since there were five sitters and three mediums, there were at least two sitters waiting in the sitter room at all times.

Control for 'Hot Reading'

The order of sessions (each sitter was tested with each medium) was determined the morning of the day the data were to be collected. Hence, even if the mediums had somehow obtained prior information about the identity of the sitters (sometimes termed 'hot reading'),
the information would not have been useful during the sitter-silent condition since the mediums would not know when to apply what
information. However, if the mediums had somehow obtained detailed information about each of the five sitters by using investigative procedures ahead of time, and analysed the
yes/no feedback received during the questioning periods, they could possibly have figured out, by the end of the fourth reading, who the last sitter would be.

Scoring of the Transcripts

The fifteen audio tapes (five sitters times three mediums) were professionally transcribed. The third author extracted all initials, names, historical factors, personal descriptions, temperaments, and 'other' statements (e.g. purported opinions stated by deceased persons to the mediums), and placed them in Excel spreadsheets. The first five categories could be independently confirmed by other living relatives or friends if necessary, which makes the possibility of rater bias less likely.

The five sitters were required to score every item using a -3 to +3 scale, where -3 was a definite miss, -2 a probable miss, -1 a possible miss, 0 a maybe miss/maybe hit, +1 a possible hit, +2 a probable hit, and +3 a definite hit. Sitters were encouraged to give conservative ratings (i.e. to use less positive numbers unless they were sure of a given rating). If they did not know whether a given item was correct or not, they were to leave it blank. The three experimenters also scored the 15 readings as if the information applied to them.

Analyses were performed only on the most stringent ratings (-3's and +3's).

Analyses

In this paper different Figures are used to display the number of total items, percentages of -3 (definite miss) and +3 (definite hit) scores, and numbers of 'information packets' (see below), averaged in different ways to address different questions (e.g. averages for each of the five readings in total items - an exploratory analysis, or averages of -3 and +3 scores for readings versus controls - the primary hypothesis). Analyses of variance were performed when appropriate (e.g. 2 x 2 designs comparing ratings of self/experimental readings versus other/control readings by +3 versus -3 ratings) and are described in the text. Primary hypotheses and exploratory analyses are indicated in the headings. The Figure captions typically include degrees of freedom, F values, and p values.

RESULTS

Total Items Examined over Readings and Mediums (Exploratory Analyses)

Figure 1 displays the total numbers of items generated by the three mediums during the sitter-silent condition, averaged over the five readings. Since the purpose of this paper is not to focus on the individual personalities and performances of the mediums, the mediums are simply labelled 1, 2 and 3.

It can be seen that Medium 1 generated substantially more items (approximately 60 items per reading on the average), compared with mediums 2 and 3, who averaged around 25 and 20 items each session respectively. The total number of items generated during the silent periods for the three mediums over the 15 readings was 528.

Two analyses of variance were reported on the total number of items. The first examined mediums (3), the second examined readings (5).

The main effect for mediums was significant (F(2,12) = 14.07, p < 0.0007). Note that 2,12 refers to degrees of freedom, 14.07 is the F ratio, and p < 0.0007 is the probability value; this structure is used for all analyses of variance reported.

Figure 2 displays the number of items over the course of the five readings (from the first of the day to the last), averaged over the three mediums. The analyses of variance revealed a significant main effect for readings (F(4,8) = 4.80, p < 0.0286).

The mediums predicted that they would tire over the course of the five sessions. The data are consistent with their hypothesis of an apparent fatigue effect over the course of the experiment.

When the percent number of hits (+3's) and misses (-3's) were examined comparing when the sitters rated their own three readings (called 'readings' on the graphs) versus when the sitters rated the other twelve readings (called 'controls' on the graphs), was there (1) a greater number of hits for their own readings compared to the controls, and (2) a reduced number of misses for their own readings compared to the controls? Scores of +3's and -3's were selected for analysis since they represented the most definitive ratings of the sitters.

As displayed in Figure 3, it can be seen that the percent hits (shown on the left) was higher for the actual readings (40%, circles with solid lines compared to 25%, squares with dashed lines) than for the control readings, and this pattern was reversed for the misses (29% versus 42%). The total difference was 28% (15% increase for hits and 13% decrease for misses).

In terms of actual numbers of items per reading, the average number of +3 hits was 12 for the actual readings versus 6 for controls. The average number of -3 misses per reading was 7 for actual readings versus 13 for controls. Interestingly, the number of blanks (unscorable items as perceived by the sitters) were almost identical for readings and controls (6 in each case.)

The experimenters also scored the entire set of 15 readings. Their averages turned out to be virtually identical to the sitters' control ratings. The experimenters obtained an average of 25% +3 hits compared with 25% +3 hits for sitters' controls; the experimenters obtained an average of 55% -3 misses compared with 42% -3 misses for sitters' controls.

In sum, increased accuracy was observed only for items obtained for the sitters' own readings. This effect was observed to various degrees across the categories of information.

The mediums varied in their percent accuracy. Medium 1 averaged 40% accuracy, Medium 2 averaged 28% accuracy, and Medium 3 averaged 54% accuracy. Since Medium 1 had more than twice as many items as either of the other two mediums, and Medium 3 had the least number of items, it appears that the mediums' accuracy is not a simple function of sheer number of items generated per se. The apparent decreased accuracy of Medium 2 represents an unanticipated and meaningful finding that is reported in the qualitative analysis section of the discussion.

Importantly, each medium showed discrimination in terms of reading versus control ratings. These findings are displayed in Figure 4. These data indicate that replication across mediums was observed in this experiment.

It can be seen that +3 hits were consistently higher for the sitters' own readings compared to control ratings (solid versus dashed lines, left box), and -3 misses were consistently lower for sitters' own readings compared with control ratings (solid versus dashed lines, right box).

When the data are averaged across the three mediums, every sitter has higher ratings scores for their own ratings compared with the controls.

Strings of Hits - Information Packets (Primary Hypothesis)

The ratings of individual items do not address the observation that the mediums sometimes obtained strings or chains of hits that represented themes and meaningful content. In Schwartz et al. (2001), we frequently observed sequences or patterns of accurate groups of items. For example, a medium might report "a deceased son (historical fact), the letter M (initial), tall and thin (personal appearance), shy (temperament)", and all four consecutive pieces of information were +3 accurate. We decided to examine this hypothesis systematically.

For the present experiment, a new scoring procedure was implemented where sitters and experimenters searched for three or more consecutive hits that reflected a meaningful theme or content group.

For the experimenters, the category 'readings' referred to the five readings for the specific medium for whom they served as experimenter. The 'control' category referred to the ten readings for the other two mediums. This made it possible to look for potential
subtle expression of information reflecting the experimenters' deceased loved ones entering into their respective medium's readings.

To make sure that the sitters and experimenters used the same criteria, each person (five sitters and three experimenters) had to (1) present to the entire group every possible information packet they scored in all 15 readings, and (2) justify inclusion or exclusion. The group as a whole pushed the sitters to include as few information packets as possible (to deflate their scores), and pushed the experimenters to include as many information packets as possible (to inflate their scores). Despite these efforts, the experimenters clearly detected a minimal number of possible information packets (Figure 5: dashed line, left and right squares), even during their 'readings'.

As can be seen in Figure 5, the primary effect appeared for the sitters themselves (solid line) compared with the experimenters (dashed line). Sitters detected approximately two information packets per reading on the average for their own readings (left circle) compared with approximately 0.5 information packets per reading on the average for the control readings (right circle). A 2 x 2 analysis of variance comparing readings versus controls for sitters versus experimenters revealed a significant interaction (F(1,23) = 7.88, p < 0.01).

There appeared to be an interesting trend in the experimenters' data. Given the small sample size (three experimenters), the analyses presented below should be viewed as possible suggestions for future research.

The experimenters scored an average of 0.6 information packets per reading (which turns out to be more than one information packet for every two readings) for the readings they witnessed with the mediums they supervised, versus 0.3 information packets per reading (which turns out to be less than one information packet for every three readings) for the readings of the other experimenters' mediums.

Since information packets were composed of individual items, a subsequent analysis was performed comparing the sitters percent +3 hits and -3 misses with the experimenters percent +3 hits and -3 misses, separately for readings and controls.

In Figure 6, the left box displays the data for the sitters (a smaller version of Figure 3 previously), the right box displays parallel the data for the experimenters. The experimenters show the overall low hits (mid 20's range) and high misses (mid 50's range) mentioned previously.

However, when the percent scores for the controls are subtracted from the percent scores for the readings, separately for the sitters (left) and experimenters (right), it can be seen that the pattern of subtractions is similar (though clearly smaller) for the experimenters compared with the sitters. Figure 7 indicates that the solid line, for the hits, is higher than 0 for both the sitters and the experimenters, and the dashed line, for the misses, is lower than 0 for both the sitters and the experimenters.

Does this apparently replicated pattern for both the sitters and experimenters reflect a subtle scoring bias? Or, was anomalous information 'bleeding' through from the minds of the sitters and the experimenters (telepathy with the living)
and/or from the 'spirit' world (the continuance of consciousness hypothesis) occurring for both sitters and experimenters? This is an important question for future research.

DISCUSSION

The present findings suggest that under controlled experimental conditions where research mediums are (1) completely blind to the identity of sitters, and (2) they are not allowed to ask any questions of the sitters (the sitter-silent condition reported in this paper), they can obtain information that is consistently and significantly scored as more accurate (higher percents of +3 hits, lower percents of -3 misses, and higher numbers of information packets) when sitters score their own readings compared with when they score the readings of other sitters (i.e. controls).

The percent accuracy scores for the readings (40% for +3 hits) are higher than the percent accuracy scores for the controls (25% for +3 hits). The 40% for +3 hits is also higher than the 29% for the -3 misses. However, these averages do not convey the richness of the actual transcripts and 'anomalies within the anomalies.' The following example illustrates some of the novel observations that emerge in this area of research. Moreover, these qualitative data address the question of the plausibility of possible rater bias as an explanation for the totality of the quantitative data observed.

For the third sitter, Medium 2 reported receiving information about a deceased grandmother (specific fact that applies to most adults, not likely affected by rater bias) who was very loving (general information that could apply to most people, and might be influenced by rater bias). However, Medium 2 also reported (1) that the grandmother brought "daisies to the sitter's mother's wedding" (a unique set of specific information that was true the sitter later told GES that her grandmother literally wove daisies into her mother's hair at the wedding), (2) that the grandmother had two dogs, a large black poodle and a large white poodle (highly specific information that was true), and (3) that the "white one tore up the house" (unusual information that was also true.). The accuracy of the information concerning the grandmother (the primary deceased person in this reading) was rated by the sitter above 70%.

Facts 1-3 cannot be plausibly explained as rater bias. To date, GES has asked audiences totaling over 2,000 people to indicate (1) if their grandmothers brought daisies to their mother's wedding (only 2 people indicated
yes), (2) if their grandmothers had two large poodles, a white one and black one (0% indicated
yes), and (3) that the white one tore up the house (0%). In contrast, more than 80% indicated
yes to having a grandmother who died (specific information that applies to most adults). Also, more than 80% indicated
yes to their grandmother being loving (general information that also applies to most adults, and may reflect rater bias).

Each sitter was given certain unique information, from each medium, that cannot be explained as rater bias. The additional qualitative information, provided below, is an example of this kind of information.

The high score (70+%) for Sitter 3 was completely offset by Medium 2's performance for Sitter 4. During the fourth sitter's reading, Medium 2 received
nothing for this sitter (i.e. 0%, which obviously lowered the average percent +3 accuracy for the entire experiment as well as Medium 2's overall accuracy score of 28%). Medium 2 reported that he could not receive any information for this sitter. Instead, Medium 2 reported receiving continued information for the previous sitter!

Medium 2 claimed that "the previous sitter's grandmother is still here." Medium 2 proposed that the previous sitter was not with another medium at that moment (it turned out he was correct), which was his explanation for why the grandmother was still in his presence. Medium 2 reported hearing the song "On the Good Ship Lollipop" and "Sabrina-The Teenage Witch" (two highly specific pieces of information).

After the fourth (and seemingly failed) reading was completed, the experimenter (GES) brought the sitter back to the room where the sitters were sequestered. To his surprise, he discovered that the previous sitter was still there!

When he questioned this sitter, she informed him (with emotion) that (1) she had curly brown hair as a child, and sang and danced Shirley Temple songs with her grandmother; however, she had to speak with her mother, after the experiment, to learn that one of the songs was "On the Good Ship Lollipop"; (2) the sitter's name was Sabrina; and (3) when she was a teenager, some children teased her about being "Sabrina-The Teenage Witch", and she went to her grandmother for solace.

Medium 2's specific prediction and information were confirmed. The only information previously known by the experimenter was that the third sitter's name was Sabrina and that she had a deceased grandmother. This is only one of numerous remarkable and convincing examples of apparent genuine anomalous information retrieval in the current experiment. None of the other four sitters scored this information as accurate for them.

The information from Sitter 3 was independently confirmed by the sitter's mother (who is a Professor at the University of Arizona). Since the quantitative data reflect scoring of the qualitative data, at least a subset of the quantitative data appear to be immune from rater bias. Hence, it is highly improbable that rater bias can explain the totality of the findings. It is also highly improbable that subtle non-verbal cues can explain the specificity of the totality of the information either.

FUTURE RESEARCH

It is possible to extend the sitter-silent paradigm to create double-blind and even triple-blind studies. As explained below, in double blind (and triple-blind) studies, the sitters do not hear the information during the readings (their phones are on mute, on a separate telephone line); they later receive blinded transcripts to score.

A multi-center, triple-blind study has been approved by the University of Arizona Internal Review Board committee, and a successful pilot study has been conducted. Sitters are selected by their respective universities, and only the investigators in each center know the identities of their respective sitters. The mediums are located in various parts of the country. A research coordinator works with the investigator to schedule long-distance phone appointments.

At the appointed time, the research coordinator calls the investigator. The research coordinator's phone is then placed on mute, and the sitter is asked to hold the silenced phone to his or her ear for the duration of the sitter-silent reading. The medium is then called on a second line. The medium speaks into the phone for ten to fifteen minutes. The information is tape recorded.

Note that the medium never sees or hears the sitter, and the sitter never sees or hears the medium (they are each totally blind, and deaf, to the other). Also, note that although the research coordinator is not blind to the identity of the medium, he or she is blind to the identity of the sitter (hence triple-blind).

When all the data are collected, the sitters are mailed two sets of transcripts. One set is of the transcripts obtained from their personal readings with the various mediums; the other set is of someone else's set of readings with the various mediums.
Hence, the sitters are blind to which readings are theirs. All items are scored. In addition, the sitters are asked to select the set they judge to be their personal readings.

If positive findings are obtained, possible conventional explanations of 'cold' or 'hot' reading (magician tricks used by fake mediums), non-verbal cueing and rater bias (though possible in the previous studies, is highly improbable e.g. consider the highly specific grandmother information reported above), experimenter bias, and guessing, will be ruled out in the double and triple-blind studies.

After many investigations with Mrs Piper, William James came to the conclusion that "I should be willing now to stake as much money on Mrs. Piper's honesty as on that of anyone I know, and I am quite satisfied to leave my reputation for wisdom or folly, so far as human nature is concerned, to stand or fall by this declaration."

As a result of conducting multiple experiments with Laurie Campbell, John Edward, and Suzane Northrop (who have been putting their professional careers on the line to be investigated in a University laboratory) over the past three years, we should be willing to follow James and state that concerning their behavior in our laboratory, we are "willing now to stake as much money on Campbell, Edward and Northrop's honesty as on that of anyone we know, and we are quite satisfied to leave our reputations for wisdom or folly, so far as human nature is concerned, to stand or fall by this declaration."

The challenge for contemporary psychology, neuroscience, and consciousness studies, is to consider the implications of such findings for understanding mechanisms of consciousness and their implications for the continuance of consciousness hypothesis (Schwartz, with Simon, 2002).

ACKNOWLEDGEMENTS

We thank Zofia Weaver, Trish Robertson, Archie E. Roy, Matthew Smith, and two additional referees who remained anonymous, for their insightful suggestions for the manuscript.