Validity and Reliability of Clinical Signs in the Diagnosis of Dehydration in Children

Abstract

Objective. To determine the validity and reliability of various clinical findings in the diagnosis of dehydration in children.

Design. Prospective cohort study.

Setting. An urban pediatric hospital emergency department.

Participants. One hundred eighty-six children ranging in age from 1 month to 5 years old with diarrhea, vomiting, or poor oral fluid intake, either admitted or followed as outpatients. Exclusion criteria included malnutrition, recent prior therapy at another facility, symptoms for longer than 5 days’ duration, and hyponatremia or hypernatremia.

Methods. All children were evaluated for 10 clinical signs before treatment. The diagnostic standard for dehydration was fluid deficit as determined from serial weight gain after treatment.

Main Results. Sixty-three children (34%) had dehydration, defined as a deficit of 5% or more of body weight. At this deficit, clinical signs were already apparent (median = 5). Individual findings had generally low sensitivity and high specificity, although parent report of decreased urine output was sensitive but not specific. The presence of any three or more signs had a sensitivity of 87% and specificity of 82% for detecting a deficit of 5% or more. A subset of four factors—capillary refill >2 seconds, absent tears, dry mucous membranes, and ill general appearance—predicted dehydration as well as the entire set, with the presence of any two or more of these signs indicating a deficit of at least 5%. Interobserver reliability was good to excellent for all but one of the findings studied (quality of respirations).

Conclusions. Conventionally used clinical signs of dehydration are valid and reliable; however, individual findings lack sensitivity. Diagnosis of clinically important dehydration should be based on the presence of at least three clinical findings. dehydration, capillary refill, clinical assessment, interobserver agreement.

Children with acute gastroenteritis or other illnesses that cause vomiting, diarrhea, or poor oral fluid intake are at risk for developing dehydration. The gold standard for diagnosis of dehydration is measurement of acute weight loss. Because a patient’s true preillness weight is rarely known in the acute care setting, an estimate of the fluid deficit is made based on clinical assessment. This estimate is used to determine the need for therapy and the type of therapy to be used, and to monitor the patient’s response to treatment. Failure to recognize dehydration leads to increased morbidity and mortality, while overdiagnosis can result in overutilization of health resources.1

Conventionally used clinical diagnostic criteria for evaluating dehydration have been codified by the World Health Organization (WHO),5 in standard textbooks,6 and in a practice parameter on the management of acute gastroenteritis recently published by the American Academy of Pediatrics (AAP).7There are important inconsistencies between sources, however.8 Moreover, the criteria have not been rigorously validated, and recent evidence has called into question their usefulness.9,10 The purpose of this investigation was to describe the performance of commonly used clinical findings in the diagnosis of dehydration in children.

METHODS

Setting and Subjects

Eligible patients were children age 1 month to 5 years of age treated in a large, urban pediatric emergency department (ED) for a chief complaint of vomiting, diarrhea, or poor oral fluid intake. A convenience sample of such patients, who were seen during a 17-month period (January 1994 through May 1995) when study personnel were on duty, was enrolled prospectively. Because certain conditions can independently alter the clinical signs under study, patients meeting the following criteria were excluded: symptoms longer than 5 days’ duration, history of cardiac or renal disease or diabetes mellitus, malnutrition or failure to thrive, or treatment in the prior 12 hours at another health facility. For the same reason, whenever serum electrolytes were ordered at the discretion of the treating physician, those subjects found to have hyponatremia or hypernatremia were excluded. Children having undergone tonsillectomy in the prior 10 days were managed by the otolaryngology staff and were therefore not eligible. Finally, families of patients had to have access to telephone or beeper for follow-up. Parents of eligible patients were asked for informed consent to participate, and the study was approved by the Institutional Review Board.

Study Procedure

At entry into the ED, a clinical assessment was performed on all eligible subjects by one of the study personnel. Participating personnel included 17 ED nurses with a minimum of 4 years of pediatric experience. Whenever a second study nurse or one of the investigators was available in the ED, a second assessment was performed independently to assess interobserver reliability. The assessments were performed before any oral or intravenous rehydration therapy was administered in the ED; fluids given by the parent before the visit were not recorded.

The clinical assessment consisted of 10 commonly elicited signs of dehydration, including those originally recommended by WHO (Table1), heart rate, and capillary refill time. For categorical variables, findings were marked as normal, moderately abnormal, or markedly abnormal. Capillary refill time at the fingertip was measured with a stopwatch using a standardized technique that included monitoring ambient temperature,11 and the mean of three readings was recorded.

Because the child’s usual, preillness weight is rarely known at the time of an ED visit, the gold standard for calculation of fluid deficit was based on weight gain following resolution of illness. Before treatment, each patient was weighed on an electronic scale using a standard protocol (infants wearing only a dry diaper; hospital gown and no shoes for older children). Children were then treated by the ED staff, without regard to participation in the study. At the time of discharge from the ED, all patients admitted to the hospital were enrolled for follow-up. Of those discharged to home, a 30% sample, randomly selected based on the last digit of the medical record number, was invited to return for follow-up. To ensure complete follow-up, patients were provided with transportation for return visits (round-trip cab trip or parking reimbursement) and a gift certificate for a meal at a fast-food restaurant, and parents were reminded of their appointment by phone the day before.

Admitted patients were weighed twice daily until discharge, using electronic scales calibrated monthly to agree within ±.5% with those used in the ED. Stable weight was reached when two consecutive weight measurements differed by <2%; the mean of these weights was the final weight. Subjects followed on an outpatient basis were weighed at the end of the ED visit, and again at a scheduled follow-up visit 48 to 72 hours later, when symptoms of vomiting, diarrhea, or poor intake had resolved according to the parent’s report. For those outpatients whose ED and follow-up weights differed by >2%, subsequent daily follow-up visits were scheduled until stable weight was achieved. The child’s fluid deficit at enrollment was calculated as the percentage difference between initial and final weights. Clinically important dehydration was defined as a fluid deficit of 5% or greater.

To validate the use of postillness weight gain as a surrogate measure of acute weight loss, we attempted to obtain preillness weight information from the primary care provider for the first 50 subjects enrolled. If a child’s weight had been measured in the 10 days before the ED visit (5 days for children under 6 months of age), this was used as the preillness weight. For those with no very recent weight recorded, growth charts were requested. A growth curve with at least 3 data points and not crossing percentile lines was considered acceptable. The weight predicted from this curve for the date of the ED visit was then used as the preillness weight. In all cases, the preillness weight was determined by an investigator unaware of the patient’s presenting or postillness weights.

Data Analysis and Sample Size Calculation

Sensitivity and specificity for detecting a deficit of 5% or more, with 95% confidence intervals (CIs), were calculated for each of the clinical findings. The categorical variables were made dichotomous by combining the categories of moderately and markedly abnormal, because so few patients were assigned to the most extreme category. For heart rate and capillary refill time, receiver-operator characteristic (ROC) curves were constructed using different cutoff levels for normal. The level providing optimum discrimination was then used to dichotomize the variable. Tests for significance were based on the χ2 statistic for the 2-by-2 tables, with a significance level of P < .05 chosen a priori. Those findings significantly associated with the presence of dehydration by univariate analysis were entered in a logistic regression model, using the MultLR public-domain software.12

The desired sample size—61 subjects with dehydration—was calculated to allow estimation of sensitivity to within ±.125, assuming a worst-case estimate of a sensitivity of .50.

RESULTS

Characteristics of Enrolled Patients

Two hundred twenty-five children were initially enrolled for follow-up. Of these, 116 were admitted to the hospital. Three inpatients did not have serial weights measured, and two each were excluded for hyponatremia and hypernatremia, leaving 109 inpatients for inclusion. Of the 109 patients enrolled for follow-up as outpatients, 7 were inappropriately enrolled (no phone number obtained or enrolled on days when no follow-up visits were available). Two-thirds of the outpatients chose to have cab transportation, and 60 of these 68 (88%) completed their follow-up visits. Of the 34 who chose to receive parking reimbursement, 17 (50%) returned. The overall outpatient follow-up rate was 76%, yielding 77 outpatients with complete follow-up. Seventy-one of these children had stable weight at the first return visit, 4 returned twice, and 2 required three visits.

Of the total of 186 patients included in the analysis, 55% were boys. The median age was 13 months, with 89% <36 months of age.

Validation of Gold Standard

Figure 1 shows the preillness and postillness weights for the 19 children (10 outpatients) on whom preillness weight information could be obtained. The median age of this subgroup was 29 months, with 37% <1 year of age. For 17 of these, preillness weight was predicted from growth charts. There was a near-perfect correlation between preillness weight and postillness weight, with a Pearson product-moment correlation coefficient of .9988. The slope of the regression line was 1.002 (95% CI: .979, 1.025), with an intercept of .086 (95% CI: −0.224, .396; P = .59). The intraclass correlation coefficient was .997. Mean preillness weight was 12.46 kg (SD: 5.22), while the mean postillness weight was 12.35 kg (SD: 5.17). The mean difference between the two weights was .114 kg (95% CI: −0.012, .242); this difference was not statistically significant.

As a further check on the accuracy of using postillness weight gain to determine fluid deficit, the deficit was also calculated as the percentage difference between preillness weight and presenting weight for the 19 children with preillness weight information. There was 90% overall agreement between the methods in categorizing patients as dehydrated or not, with 7 of the 19 classified as dehydrated, and 10 as well hydrated, by both methods. Of the two subjects who were classified differently by the two methods, both had a deficit close to the 5% cutoff. The mean absolute difference between fluid deficit calculated from the preillness weight and that calculated from the postillness weight was .67% of body weight (95% CI: −0.157, 1.49).

Outcomes

Sixty-three (34%) of the patients had a clinically important fluid deficit of at least 5%, while 11 (5.9%) had a deficit of 10% or more. Among the 109 inpatients, 55% were 5% or more dehydrated; the prevalence of dehydration in the outpatients was 3.9%. The median deficit among those with clinically important dehydration was 6.0% (range 5 to 14.1%), compared with a median of .5% (range 0 to 4.2%) in the group without important dehydration. The median age did not differ between the two groups.

The median time to achieve a stable weight was 24 hours (range 12 to 72 hours) in the 26 inpatients for whom this information was recorded. Only one child required 72 hours to achieve a stable weight. Of the 77 outpatients with complete follow-up, 69 had a stable weight at the first return visit, while three required a second follow-up visit and two required a total of three follow-up visits. All outpatients completed follow-up within 96 hours; only two needed longer than 72 hours.

Diagnostic Performance of Clinical Findings

All 10 clinical findings studied were significantly associated with the presence of dehydration. The sensitivity of individual findings varied from .35 to .85, and the specificity ranged from .53 to .97 (Table 2). For capillary refill and heart rate, ROC curves were constructed using different cutoff levels to define normal. The optimal cut-offs were 2 seconds for capillary refill, and 150 beats per minute for heart rate. For heart rate, a curve was also generated using different cut-offs based on age, but the area under the curve was less than that using a single cut-off. Sensitivity and specificity for all findings did not change significantly when stratified based on the presence or absence of fever.

By including all admitted patients but only a fraction of discharged patients, the sampling framework used in this study produced a study sample with a higher prevalence of dehydration than in the general ED population with symptoms of vomiting or diarrhea. Therefore, predictive values cannot be measured directly from these data. Based on a review of records of eligible patients seen during the study period, we estimated the actual prevalence of dehydration among eligible children in this setting to be approximately 10%. Using Bayes’ theorem, positive and negative predictive values were calculated for each of the clinical findings (Table 2). Predictive values for positive findings were generally low, ranging from .17 to .57, while negative predictive values were substantially greater, from .93 to .99, because of the low prevalence of dehydration.

Combinations of Findings

The number of clinical findings present increased with the degree of dehydration. The median number of findings among subjects with no or mild dehydration (deficit <5%) was 1; among those with moderate dehydration (deficit 5% to 9%) it was 5 and among those with severe dehydration (deficit ≥10%) the median was 8 (P< .0001, Kruskal-Wallis H test).

A multiple logistic regression model was created to identify those findings significantly associated with the presence of a fluid deficit of 5% or more when controlled for the presence of other findings. Four of the 10 findings were independently associated with dehydration: general appearance, capillary refill >2 seconds, dry mucous membranes, and reduced tears (Table 3).

These results were then used to generate two prediction models for dehydration. One model incorporated all 10 clinical findings, while the other included only the four findings with an independent association based on the multivariate model. ROC curves for each model were generated by calculating the sensitivity and specificity of the model for detecting dehydration of 5% or more at each possible threshold score, as shown in Fig 2. For the model including all findings, the presence of any three or more had the best combination of sensitivity (.87) and specificity (.82), with an overall accuracy of 84%. Two or more findings was the optimal threshold for the model with four findings, yielding a sensitivity of .79, specificity of .87, and accuracy of 85%. The discriminative ability of the two models was similar and not statistically different, as indicated by the areas under the ROC curves (.91 for the 10-item model, vs .90,P > .3).

Receiver-operator characteristic curve for diagnosis of dehydration by different combinations of clinical findings. The number of findings represented by the optimal cutoff points for each model are indicated. The areas under the curves were very similar: .9142 for the 10-item model, and .8976 for the 4-item model (P > .3).

To predict a deficit of 10% or more, the optimal cut-off was 7 or more of the 10 findings. This threshold has a sensitivity of .82 and a specificity of .90. The presence of at least three of the subset of four clinical indicators detected a 10% deficit with a sensitivity of .82 and specificity of .83. Again, the discriminative ability of the two models was similar and not statistically different.

We next examined those children who were incorrectly classified by either model, approximately 15% of the total. There was no significant difference in age between false positives, false negatives, and those correctly classified. As expected, those individual findings with the lowest specificity (eg, dry mucous membranes) contributed most to overdiagnosis, and those with the lowest sensitivity (eg, capillary refill) most often led to underdiagnosis; however, no systematic pattern of specific findings was noted in those among the false positives or false negatives. Of the children who were incorrectly classified by the four-item model, 50% of the false positives and 50% of the false negatives would have been correctly classified by using the 10-item model instead.

Interobserver Reliability

Interobserver reliability was measured in the group of 84 subjects who had two independent assessments by different observers. Reliability for individual signs was generally good to excellent (Table 2), with κ ≥ .5 for all but one of the findings. Reliability was also determined for capillary refill time as a continuous variable using the intraclass correlation coefficient, which was .71. Agreement between observers on the presence of any three or more findings was also very good, with a κ = .68.

DISCUSSION

Despite the enormous advances in recent decades in the treatment of dehydration, especially in the use of oral rehydration therapy (ORT), little attention has been focused on the clinical diagnosis of this condition. Existing diagnostic recommendations are largely based on opinion and experience, and have not been subject to objective validation.8 Many of the previous validation studies involve very small numbers of patients or are restricted to the evaluation of an individual sign.13 No single physical finding has been shown to be sufficiently accurate, and important limitations have been noted in the diagnostic usefulness of capillary refill time, the best studied sign of dehydration.11,18

Mackenzie et al9 comparing clinical assessment with actual weight change in the most comprehensive prior study, showed that house officers commonly overestimated the degree of dehydration among children hospitalized for gastroenteritis. Moreover, of the 13 specific findings studied, few were significantly associated with the presence of dehydration; false positive rates for all findings were high. The authors concluded that physical findings appear at a lower fluid deficit—3% of body weight lost—than the 5% to 10% commonly cited in the literature. However, there are several important limitations to their study, including the use of relatively inexperienced observers, and the inclusion of only hospitalized, and presumably more symptomatic, patients. We have previously observed the same tendency toward overdiagnosis among more experienced physicians as well.10 The present study was designed to determine the diagnostic performance of clinical signs individually and in combination in a less selected population at risk for dehydration.

All 10 findings we studied had a significant association with the presence of dehydration. In contrast to the results of Mackenzie et al,9 both the true positive rate (sensitivity) and false positive rate for individual findings were generally quite low. This is most likely due to the inclusion of a broader spectrum of patients, including those admitted to the hospital and those discharged from the ED. In addition, the very high follow-up among the outpatients helped further reduce any potential selection bias.

The very high specificity (ie, low false positive rate) of almost all the findings was initially puzzling, given the previously demonstrated tendency toward overdiagnosis of dehydration. There are several possible explanations. First, when the prevalence of dehydration is low, even with very high specificity the absolute number of false positives will be great compared with the true positives. Therefore, the positive predictive value, or posterior probability of disease, will be low. Second, clinicians have traditionally been taught that objective signs of dehydration do not appear until the deficit exceeds 5%. Like Mackenzie et al,9 but in contrast to WHO and AAP recommendations, we found that clinical signs were already apparent with a deficit of 5%; in this study, children with 5% dehydration had a median of five findings. Third, clinicians may place greater reliance on individual findings with high sensitivity but low specificity. In particular, a parent’s report of decreased urine output was very nonspecific; diagnosis of dehydration based on this criterion alone would have a positive predictive value of only around 17% in this population. Our results suggest that diminished urination actually occurs early in the process of dehydration, which is in agreement with studies in adult volunteers.19

Although a high specificity is desirable in a population with a low prevalence of disease, the sensitivity of each of the individual signs was unacceptably low to recommend any one finding as diagnostic of dehydration. One way to boost sensitivity while maintaining a reasonable specificity is to use a parallel testing strategy—in this context, the use of several clinical criteria rather than one. In our study, the presence of any three or more findings was the optimal threshold for diagnosing a deficit of 5% or more, providing much greater sensitivity with little loss of specificity compared with a cut-off of one or two findings, or compared with any single sign in isolation. Using logistic regression modeling to eliminate redundancies, we were further able to identify a subset of four findings—capillary refill >2 seconds, dry mucous membranes, absent tears, and abnormal general appearance—which was nearly as diagnostic as the entire set. For this restricted subset of indicators, the optimal cut-offs for diagnosing a deficit of 5% or 10% are two and three findings, respectively.

Limiting the number of criteria to the most parsimonious set that provides the desired level of accuracy enhances the ease of use and generalizability of a diagnostic scheme. It is particularly important when there is the potential for substantial interobserver variability. We found a reasonably high degree of agreement for all findings. This has not been previously reported in the literature, and is perhaps surprising given the subjective nature of many of the signs elicited. However, the observers in this study were highly trained pediatric nurses and physicians. Reliability among less experienced individuals may be significantly lower. It is therefore encouraging that the agreement on whether a patient was classified as dehydrated or not, based on the presence of three or more findings, was higher than that for most of the findings considered individually.

Existing schemes for the classification of dehydration based on clinical findings have several inconsistencies. First, the percentages of body weight reduction that correspond to different degrees of dehydration vary among authors.7,8 We followed the WHO convention (Table 1), which categorizes deficits of <5%, 5% to 10%, and >10% as “no dehydration,” “some dehydration,” and “severe dehydration,” respectively,5 while the AAP practice parameter describes deficits of 3% to 5%, 6% to 9%, and ≥10% as “mild,” “moderate,” and “severe” dehydration.7 Such minor differences in the cut-off between categories should not cause substantial clinical confusion. A more serious limitation of such classifications is that they do not provide any guidance for categorizing patients with some findings suggestive of two or more different degrees of dehydration. Our system is an improvement over tabular schemes such as those above, in that the classification of a patient is based on the number of findings present.

Improved clinical assessment, and specifically the recognition that the appearance of clinical signs of dehydration occurs with a deficit of <5% has important implications for the treatment of dehydration in children. ORT has been recommended as the treatment of first choice for children with mild or moderate dehydration, most recently in the AAP practice parameter.7 However, among the many barriers to wider use of ORT in the United States that have been identified, one is the reluctance of physicians to use ORT in children perceived to have moderate or severe dehydration.20,21 More accurate estimation of a patient’s fluid deficit than achieved by the conventionally used criteria may thus contribute to wider use of ORT and a decrease in unnecessary parenteral therapy and hospitalization.

One potential limitation of this study is the use of postillness weight gain to estimate the preillness weight loss. Although this approach has been used widely in dehydration research, it has not been previously validated.9,10,13,16,22,23 Our data demonstrate an excellent correlation between weight measured after treatment in children with acute gastrointestinal illness and the child’s preillness “well” weight. The small mean difference between preillness and postillness weights, and the positive intercept of the regression line suggest a slight bias toward underestimation of preillness weight by using postillness weight. The difference, however, was neither statistically significant nor clinically meaningful. The mean absolute difference in fluid deficit calculated using the different estimates of usual weight was only .67%, much smaller than the 5% or more usually taken as indicating mild or moderate dehydration.

Similarly, there is a small margin of error created by our method of determining final weight. Following the convention of other researchers,13 we considered posttreatment weights stable if they agreed to within ±2%, to account for differences in bowel, bladder, and stomach contents; basing the deficit on the mean of these weights leads to a margin of error of ±1% for the deficit. Our cutoff for clinically important dehydration of a 5% deficit thus actually represents a deficit of 4% to 6%. However, this is well within the range of cutoff points reported by other authors, as noted above.7,8

This study has several other limitations. Because dehydration is uncommon among children with gastroenteritis, the number of children with dehydration in this study is relatively small, leading to less precise estimates of sensitivity and specificity. This is particularly true for the more severe grades of dehydration. We did not include historical information among the criteria to be evaluated, although such factors may add to the diagnostic performance of the physical signs. In addition, clinical findings may vary depending on the nature of losses or the duration of illness. Further work may help to clarify these issues.

We recommend that existing criteria for the diagnosis of dehydration in children be modified to reflect the fact that objective signs of dehydration are apparent with a fluid deficit of <5%. Of the 10 findings studied, none is sufficiently accurate to be used in isolation. The presence of fewer than three signs corresponds with a deficit of <5%, whereas children with a deficit of 5% to 9% generally have three or more clinical findings. At least six to seven findings should be present to diagnose a deficit of 10% or more, although this recommendation is based on a limited number of subjects. It may be possible to rely on a relatively restricted subset of clinical indicators—general appearance, capillary refill, mucous membranes, and tears. Of these four findings, the presence of any two indicates a deficit of 5% or more, and three or more findings indicates a deficit of at least 10%. We are now planning future studies to evaluate better those children with severe dehydration, and to develop a valid prediction rule for dehydration incorporating historical and physical examination variables.

ACKNOWLEDGMENTS

This work was supported by a Team Grant from the Emergency Medicine Foundation and the Emergency Nursing Foundation.

The authors wish to thank the nursing and physician staffs of the Emergency Department who participated in data collection.