* Research Nurse, Department of Anesthesia, St. Michael’s Hospital, Toronto, Ontario, Canada. † Clinical Professor of Anesthesiology, Women and Children’s Hospital of Buffalo, State University at New York, Buffalo, New York.

Article Information

Education

Education | May 2004

Development and Psychometric Evaluation of the Pediatric Anesthesia Emergence Delirium Scale

You must be logged in to access this feature.

EMERGENCE delirium
(ED) has been described as “a mental disturbance during the recovery from general anesthesia consisting of hallucinations, delusions and confusion manifested by moaning, restlessness, involuntary physical activity, and thrashing about in bed.”1 It has been considered a common postanesthetic problem in children and adults since 1960. 2–4 The prevalence of ED in children ranges from 25 to 80%, depending on the definition of ED used to measure this phenomenon. 5,6 ED, which usually occurs within the first 30 min after ether anesthesia, 7–10 has been characterized as self-limiting but of variable duration. 5,10,11 During an ED reaction, children risk injuring their surgical repair, themselves, and their caregivers. Their behavior is disruptive to the postanesthetic care unit and often requires constant nursing supervision, which strains nursing manpower resources. 12,13 Moreover, when an ED reaction occurs, all members of the healthcare team as well as the parents express dissatisfaction with the quality of the child’s recovery. 5,14 These negative effects of ED have motivated clinicians to investigate possible etiologies and potential treatments for ED. 5–7,10,14–25 However, none of the clinical investigations have used a reliable and valid tool to measure ED. Not only does this preclude comparisons among the clinical trials, but more importantly, it raises serious questions regarding measurement error and the reliability of the measurement and validity of the research results. 6,26

Sixteen rating scales 3,5,6,8–10,15,16,18,20,21,26–30 and two visual analog scales that measure agitation have been used to measure ED in young children. 7,31 (table 1). These scales are deficient in two main respects: scale content and psychometric evaluation. Behaviors including crying, agitation, and lack of cooperation have been included as items in these ED rating scales. However, these behaviors are not specific to ED. They may also characterize children who are in pain or who are frightened or angry during emergence from general anesthesia. Of the rating scales listed in table 1, two scales report reliability estimates, and one, the Heaman-Mattle emergence excitement scale, has undergone both a reliability and a validity assessment. However, the Heaman-Mattle scale was developed for teenagers and is inappropriate for use with preschool and school-aged children. Because the content of the scales in table 1was considered inadequate, further assessment of the psychometric properties of any one scale was not pursued by the authors.

To date, a reliable and valid rating scale to measure ED in children does not exist. Shrout and Fleiss 32 state that “measurement error can seriously affect statistical analysis and interpretation [of data].” Therefore, to minimize measurement error in the clinical evaluation of ED in children, we sought to develop a reliable and valid rating scale to measure this phenomenon.

Materials and Methods

Methods

This study was approved by the Research Ethics Board at The Hospital for Sick Children (Toronto, Ontario, Canada), and informed written consent was obtained from the parents of all children who participated in this study. The study methods consisted of two phases: scale development and scale evaluation. Scale development involved the construction of the Pediatric Anesthesia Emergence Delirium (PAED) scale. Scale evaluation determined the scale’s reliability and validity.

Scale Development.

First, ED was defined as a disturbance in a child’s awareness of and attention to his or her environment with disorientation and perceptual alterations including hypersensitivity to stimuli and hyper-active motor behavior in the immediate postanesthesia period. This definition was predicated on the theoretical framework of delirium found in the Diagnostic and Statistical Manual of Mental Disorders
. 33–36 Second, the anesthesia, nursing, and psychiatric literature was reviewed, and interviews were conducted with pediatric anesthesiologists, PACU nurses, and a pediatric psychiatrist to collate behavioral descriptions of children thought to have ED or delirium. 37 From these behavioral descriptions, six categories of ED behaviors were derived: cognitive behavior, behavioral response to environmental stimuli, behavior threatening patient safety, motor behavior, affective behavior, and vocal behavior. Guided by the definition of ED and the six behavioral categories, a list of preliminary scale items or statements that described the emergence behavior of children was compiled.

The preliminary scale items were evaluated by seven experts, including four senior pediatric PACU nurses, two pediatric anesthesiologists, and a pediatric psychiatrist, to determine their content validity. These individuals were considered experts because they had clinical expertise with the emergence behavior of children, knowledge of the conceptual framework of delirium described in the Diagnostic and Statistical Manual of Mental Disorders
, or knowledge of the scale development process. 38

The content validity evaluation was a two-step process for which specific instructions where given to each expert. 39 First, each expert was asked to rate the relevance of each scale item to the definition of ED using a seven-point scale ranging from not at all relevant (score of 1) to extremely relevant (score of 7). Second, the experts were asked to determine which of the six behavioral categories of ED each item best represented. The definition of ED and the behavioral categories were given to each expert. Items deemed content-valid were then pretested on a group of 100 children. For pretesting, items were scored as they would be for the final scale using the five response options: not at all (score of 0), just a little (score or 1), quite a bit (score or 2), very much (score of 3), and extremely (score of 4). 40 Reverse scoring of items included the options not at all (4), just a little (3), quite a bit (2), very much (1), and extremely (0) and was used where applicable so that the greater the item score, the greater the degree of ED. During pretesting, each item was used by one of the authors (N. S.) to evaluate the emergence behavior of 100 children 10 min after the child awakened from anesthesia. Children were included if they were aged between 18 months and 6 yr; had an American Society of Anesthesiologists physical status class of I or II; had no known behavioral disorders; understood English; had no known contraindications to inhaled anesthetics; and were scheduled to receive sevoflurane, isoflurane, or halo-thane for maintenance of anesthesia for an elective out-patient surgical procedure. Children were excluded if they needed premedication, had cognitive impairment, or were at risk for malignant hyperthermia. The evaluating author (N. S.) was blinded to the type of anesthetic that the child received during surgery. The scores on each pretested scale item were analyzed (statistical item analysis) to obtain a statistical profile of each item. 39 Items with a poor statistical profile were eliminated, and those with a good profile comprised the PAED scale.

Scale Evaluation.

To determine interobserver reliability, the emergence behavior of 50 children was rated by a set of three observers using the PAED scale, 10 min after the child awakened and remained awake (did not fall back to sleep) postoperatively. Two of the three observers in each set were chosen at random. One of the authors (N. S.) was the third observer in all cases. All observers were blinded to the anesthetic agent administered during maintenance and were asked to refrain from discussing their evaluations with one another. A total of 37 observers participated, including 32 PACU nurses, 3 anesthesiologists, 1 paramedic, and the author (N. S.). To determine construct validity, five hypotheses were tested. 39

The PAED scale scores correlated negatively with the child’s time to awakening, defined in minutes as the time from arrival in the PACU until consciousness is sustained. 4,9,11,18

Hypothesis 3.

The PAED scale scores correlated positively with a clinical judgment score of ED measured on a seven-point scale from none (score of 1) to an extreme amount (score of 7). Each of the three observers in the reliability study completed the clinical judgment score after evaluating the child with the PAED scale.

Hypothesis 4.

The PAED scale scores correlated positively with the child’s Post Hospital Behavior Questionnaire (PHBQ) scores as evaluated by a parent on postoperative days (PODs) 2 and 7. 42–47 Parents were telephoned on the second postoperative day to answer any questions regarding the questionnaires and to remind them to return the completed questionnaires. Questionnaires were returned to the investigator in self-addressed envelopes.

Hypothesis 5.

The PAED scale scores in children who received sevoflurane were greater than in those who received halothane. 5–8,16,19,21,25,27,48 The choice of anesthetic administered was determined by the child’s attending anesthesiologist.

ROC Curve Analysis.

The sensitivity of the PAED scale was investigated using receiver operating characteristic (ROC) curve methodology. 49 A positive case of ED was defined as a child who received intravenous dimenhydrinate postoperatively in the absence of vomiting to control an ED reaction. A negative case was defined as a child who did not receive dimenhydrinate. Both morphine and dimenhydrinate were used for their sedative effects to treat children with difficult emergence behavior. However, because it was unclear whether children who were given morphine were in pain, children who were treated with morphine were excluded from the ROC analysis.

Sample Size

Scale Development.

It has been recommended that between 3 and 10 experts evaluate content validity. 50 The sample size for the item pretesting was based on enrolling five subjects for each item to be pretested. 51

Scale Evaluation.

The sample size for the interobserver reliability study was estimated using a Pearson product–moment correlation coefficient (r
) of 0.75, 39 a half-width of the confidence interval (CI) of ± 0.1, and an α2of 0.05. 52 A sample size of 50 children was estimated.

The sample size for validity hypotheses 1–4 was based on the estimated maximum coefficient of r
= 0.86 (or the √0.75). 39 A sample size of 47 children was estimated. 52

The sample size required to test hypothesis 5 was based on an estimate of the expected effect size. Because the PAED scale is a new measure and no data exist to compute an effect size, the effect size was estimated. Assuming a medium effect size of 0.5 between the PAED scale scores of children who received sevoflurane and those who received halothane, the sample size for each group was estimated to be 63 children. 53

Statistical Analysis

Descriptive statistics were used to characterize the study sample. Age and duration of surgery were recorded as means and SDs. Type of surgery, type of inhalational anesthetic administered during surgery, and use of intraoperative narcotics were reported as proportions.

Scale Development.

An item was deemed content relevant if it was rated at 4 or greater on the seven-point scale by six of the seven experts and if it represented only one of the six ED behavioral categories. 39,50 Statistical item analysis 39,51 included compiling the frequencies of the response options for each item (endorsement frequency) and the correlations between each item (item–item correlations) and between the item’s score and the scale’s total score (the item–total correlations). Items with response options that were selected with a frequency greater than 5% or less than 95% were retained. Of these, the item set with moderate item–item correlations, item–total correlations of 0.2 or greater, and an adequate internal consistency defined as an α coefficient of greater than 0.7 but less than 0.9 was selected as the PAED scale.

Scale Evaluation.

The interobserver reliability was determined using a one-way analysis of variance random-effects model and was reported as an intraclass correlation coefficient (for a single observer) with a 95% CI. 32

To construct the ROC curve, the PAED scale scores were correlated using a Spearman (ρ) correlation coefficient with the dichotomous outcome of yes/no for treatment with dimenhydrinate. An ROC curve was generated using a nonparametric distribution assumption with the PAED scale score as the target variable and a response of yes for dimenhydrinate treatment as the positive state variable. The degree of ED increased directly with the PAED scale score. The PAED scale score that maximized the area under the curve of true positives (sensitivity) and minimized the area under the curve of false positives (1-specificity) was accepted as the cutoff point to define a case of ED that required treatment from one that did not.

Results

Scale Development (fig. 1)

Twenty-seven preliminary scale items were compiled (table 2). After evaluation, 21 items were deemed to be content-valid (table 2). These 21 items were pretested on 100 children, 56 males and 44 females, aged 3.7 ± 1.5 yr (tables 3 and 4), whose surgery lasted 63.2 ± 33.6 min (mean ± SD). Twenty percent of the children received an opioid intraoperatively. Five of the 21 items were deemed to have an adequate statistical profile. These items comprised the PAED scale (table 5). The internal consistency of the PAED scale was 0.89.

Fig. 1. Flow diagram outlining the steps taken to construct the Pediatric Anesthesia Emergence Delirium (PAED) scale, starting with a set of 27 preliminary scale items and ending with the final five items selected as the new scale.

Fig. 1. Flow diagram outlining the steps taken to construct the Pediatric Anesthesia Emergence Delirium (PAED) scale, starting with a set of 27 preliminary scale items and ending with the final five items selected as the new scale.

The reliability of the PAED scale was evaluated in 46 of the 50 children. The interobserver reliability of the PAED scale was 0.84 (95% CI, 0.76 –0.90). Results of the construct validity hypothesis testing are as follows.

Of the 50 parents who were given the PHBQ, 38 returned both questionnaires (POD 2 and 7 assessments). Of the 38 respondents, two were excluded because there was no corresponding PAED scale score, and two were excluded because their children were admitted to hospital postoperatively. These last two children were excluded from this evaluation because of concern for confounding effects of hospitalization on the child’s behavior. A fifth child was excluded because the assessment on POD 2 was incomplete. 56

Hypothesis 5.

Seventeen children received sevoflu-rane for maintenance of anesthesia, and 25 children received halothane. The PAED scale scores were normally distributed in each treatment group. The average PAED scale scores of children who received sevoflurane was 7.2 ± 4.5 and of those who received halothane was 3.7 ± 2.6 (P
<0.008).

ROC Curve Analysis

Of the 100 children included in this analysis, 80 children did not receive morphine in the postoperative period. Of these, 11 received dimenhydrinate in the absence of vomiting. The ROC curve generated from these data accounted for 76.6% of the area under the curve. At a PAED scale score of 10 or greater, the true-positive rate (sensitivity) was 0.64, and the false-positive rate (1-specificity) was 0.14 (fig. 2).

Fig. 2. Receiver operating characteristic (ROC) curve for the sensitivity (true-positive rate) and 1-specificity (false-positive rate) for scores on the Pediatric Anesthesia Emergence Delirium scale. A score of 10 or greater corresponds to a sensitivity of 0.64 and a 1-specificity of 0.14.

Fig. 2. Receiver operating characteristic (ROC) curve for the sensitivity (true-positive rate) and 1-specificity (false-positive rate) for scores on the Pediatric Anesthesia Emergence Delirium scale. A score of 10 or greater corresponds to a sensitivity of 0.64 and a 1-specificity of 0.14.

To minimize measurement error in the assessment of ED, clinicians require a reliable and valid measurement tool. Using a theoretical framework of delirium, we developed the PAED scale as a rating scale to measure ED in children. We conclude that the PAED scale is a reliable and valid tool based on the scale’s reliability, content, and initial construct validity profile determined in this study.

During the development of the PAED scale, ideas for scale items were collected from a variety of resources, including a review of the item content of three validated pediatric pain scales. 57–59 Because of the known difficulty in differentiating pain from ED, it was important to preclude scale items that may also reflect pain. 5,7,21 Of the three pain scales reviewed, only the Face, Legs, Activity, Cry, Consolability (FLACC) scale includes an item of consolability. 58 All three scales use an aspect of restlessness to measure pain. Accordingly, it is possible that the PAED scale items “The child is inconsolable” and “The child is restless” may reflect pain as well as ED.

We included the salient features of delirium, i.e.
, a disturbance in consciousness and changes in cognition and the associated features, including a disturbance in psychomotor behavior and emotion, in the genesis of the PAED scale. 36 A disturbance in consciousness includes a reduced awareness of the environment and impairment in the ability to focus, sustain, or shift attention. 36 The PAED scale’s first item, “The child makes eye contact with the caregiver,” and third item, “The child is aware of his/her surroundings,” reflect disturbances in the child’s consciousness during an ED reaction. Cognitive changes may include impairment in perception and memory and disorganized thinking patterns. Purposeful movement may be altered in a child whose thinking is disorganized. The second item on the PAED scale, “The child’s actions are purposeful,” addressed changes in the child’s cognition during an ED reaction. The inclusion of items that reflect disturbances in consciousness and cognition may be pivotal to differentiating ED from pain.

The disturbance in psychomotor behavior and emotion, which are associated features of a delirium, have been captured in the fourth and fifth items on the PAED scale, “The child is restless” and “The child is inconsolable,” respectively. These are the features of ED that are most commonly incorporated in previous scales. Although these last two features may reflect pain as stated earlier, it is hoped that when they are grouped with indicators of consciousness and cognition such as items 1–3 (table 5), they better reflect ED than pain. Assessing children with the PAED scale and a valid and reliable pain scale may be required to test this assumption.

Reverse scoring was required for the first three items on the PAED scale. Reverse scoring can be easily applied by having all items scored in the conventional way (as per items 4 and 5 in table 5) and then subtracting the score of the item from a value of 4. This should make the scale easy to use even in a busy clinical setting. For example, if a conventional score of 4 (extremely) was chosen for item 1, then the actual reverse score for this item would be recorded as 0 (4 − 4), which is equal to the reverse-scored value of “extremely” in table 5.

The adjectives used for the response options were not operationally defined. This may be considered a limitation of the scale. However, large variability in the interpretation of the meaning of the response options for any item would have negatively affected the interobserver reliability coefficient. That the interobserver reliability of the PAED scale was 0.84, which exceeds the minimum acceptable reliability for a useful instrument of 0.75, suggests that the observers’ interpretations of the response options were similar enough so as to not compromise the scale’s reliability.

Whether the scores from rating scales can be considered interval data remains controversial. Unless the distribution of the scores from a rating scale is severely skewed, the data can be analyzed as if they were interval data, without introducing severe bias into the results. 39 The scores from the PAED scale were all normally distributed in this analysis.

We tested five hypotheses to explore the construct validity of the PAED scale. This is consistent with the notion that construct validity is determined by a series of converging experiments. 39 Of these five hypotheses, hypotheses 1 (age), 2 (awake time), and 5 (sevoflurane vs.
halothane) supported the construct validity of the PAED scale. Hypothesis 3, which involved the clinical judgment scores, was rejected because of criterion contamination. Criterion contamination occurs when the results of one test bias the results of another. 39 This bias artificially inflates the correlation between these two tests. In this study, the observers evaluated each child with the PAED scale first and with a seven-point scale of clinical judgment second. Because of this and the high correlation between the scores on these two scales, it is unknown to what extent the PAED scale scores biased the clinical judgment scores.

Our failure to find a statistically significant relation between ED and any negative postoperative behavioral changes (validity hypothesis 4) may be attributed to the absence of a well-established theory associating these two constructs. 39

The ROC analysis predicts a score above which an episode of ED requires treatment. The sensitivity of the scale is fair, although the false-positive rate is quite high. This may be a function of the positive state response variable used in this analysis. Further attempts to determine a cutoff point are needed, using other positive state response variables, to substantiate or improve on the ROC results determined in this study.

Our results showed that the PAED scale score in children who received sevoflurane was greater than that in those who received halothane. Although the estimated sample size for this comparison was not achieved, statistical significance was achieved because the effect size measured, 1.0, was double that used in the sample size estimation.

In conclusion, we detail the development and evaluation of a new rating scale to measure ED in children recovering from general anesthesia. Based on our results, the PAED scale is a reliable and valid measure of ED in children.

The authors thank the nurses in the Post Anesthetic Care Unit, The Hospital for Sick Children, Toronto, Ontario, Canada, for their participation in this study; David L. Streiner, Ph.D. (Professor Emeritus, Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada, and Professor, Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada), Geoffrey R. Norman Ph.D. (Professor, Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada), and Peter Szatmari, M.D. (Professor, Department of Psychiatry and Behavioural Neurosciences, Offord Centre for Child Studies, McMaster University and Hamilton Health Sciences, Hamilton, Ontario, Canada), for their guidance; and Zeev N. Kain, M.D. (Professor, Anes-thesiology, Pediatrics and Child Psychiatry, Yale University School of Medicine, and Anesthesiologist-in-Chief, Yale New Haven Children’s Hospital, New Haven, Connecticut), and Arlette Lefebvre, M.D., D.C.P. (Psychiatrist, The Hospital for Sick Children, Toronto, and Associate Professor, Department of Psychiatry, University of Toronto), for their assistance during the scale development phase.

Fig. 1. Flow diagram outlining the steps taken to construct the Pediatric Anesthesia Emergence Delirium (PAED) scale, starting with a set of 27 preliminary scale items and ending with the final five items selected as the new scale.

Fig. 1. Flow diagram outlining the steps taken to construct the Pediatric Anesthesia Emergence Delirium (PAED) scale, starting with a set of 27 preliminary scale items and ending with the final five items selected as the new scale.

Fig. 2. Receiver operating characteristic (ROC) curve for the sensitivity (true-positive rate) and 1-specificity (false-positive rate) for scores on the Pediatric Anesthesia Emergence Delirium scale. A score of 10 or greater corresponds to a sensitivity of 0.64 and a 1-specificity of 0.14.

Fig. 2. Receiver operating characteristic (ROC) curve for the sensitivity (true-positive rate) and 1-specificity (false-positive rate) for scores on the Pediatric Anesthesia Emergence Delirium scale. A score of 10 or greater corresponds to a sensitivity of 0.64 and a 1-specificity of 0.14.