Wolters Kluwer Health
may email you for journal alerts and information, but is committed
to maintaining your privacy and will not share your personal information without
your express consent. For more information, please refer to our Privacy Policy.

Purpose Previous studies have found gender bias in the global evaluations of trainees. The purpose of this study was to investigate the association of faculty and residents’ gender on the evaluation of residents’ specific clinical skills, using direct observation.

Method In 2001–2002, 40 clinician–educators from 16 internal medicine residency programs viewed a series of nine scripted videotapes depicting varying levels of residents’ clinical performance in medical interviewing, physical examination, and counseling. Differences in the ratings of women versus men faculty, in relation to differences in the residents’ gender, were compared using random-effects regression analysis.

Results There were no statistically or educationally significant differences in the rating of clinical skills attributable to faculty or residents’ gender for medical interviewing, physical examination, or counseling.

Conclusions This study suggests that gender bias may be less prevalent in the current era of evaluation of clinical skills, particularly when specific skills are directly observed by faculty. Further work is needed to examine whether the findings of this study translate to the actual training setting.

Dr. Holmboe is senior vice president for quality research and academic affairs, American Board of Internal Medicine, Philadelphia, Pennsylvania, and professor adjunct, Yale University, New Haven, Connecticut.

A number of studies have assessed the relationship between trainees’ gender and their evaluation of medical knowledge and clinical competence.1–7 Some have only assessed whether cognitive or noncognitive performance between men and women trainees is different.8,9 Other studies have assessed the perception by others of women trainees, compared with men, in the areas of humanism and technical skills.9,10 Limited data exist that address potential differences in performance ratings attributable to gender bias within the evaluation process of resident trainees by faculty. In addition, some reported differences in ratings by gender may simply reflect real differences in clinical and evaluative skills not associated with gender.

A study examining the American Board of Internal Medicine (ABIM) evaluation form found that male residents, compared with female residents, received significantly higher scores from male attending physicians than from female attending physicians in many of the domains studied.11 A small, single-institution study of faculty ratings of interns did not find any differences by gender.8 A study of 14,340 U.S. and Canadian graduates taking the ABIM certifying examination found that for overall competence, male trainees were rated higher than female trainees by program directors.12 However, a more recent study did not find significant differences in competency ratings of male and female residents on evaluation forms completed at the end of a ward rotation during a randomized controlled trial of an educational intervention.13

All of the previously published studies compared global ratings of various domains of competence. These previous findings of potential rating bias by faculty based on gender are more concerning, given more recent studies that used standardized patients (SPs) for assessment.14,15 For example, Haist and colleagues14 found that fourth-year female medical students scored higher than male students in all domains tested on a clinical skills assessment. Van Zanten et al15 reported the same finding among all international medical graduates who took the Educational Commission for Foreign Medical Graduates’ clinical skills assessment, including the areas of empathy, attentiveness, and attitude. The findings from SPs are also more in line with differences noted among female and male physicians in communication behaviors with actual patients.16

Less is known about whether the gender of the trainee and/or attending is associated with the evaluation of clinical skills through direct observation by faculty. The objective of our study was to examine the potential interaction of the gender of faculty and residents on the evaluation of the clinical skills of medical interviewing, physical examination, and counseling.

Method

Participants

Forty faculty from 16 different internal medicine residency programs from the Northeast (Connecticut, Massachusetts, and Rhode Island) and Mid-Atlantic (Maryland, Virginia, and District of Columbia) regions participated in a randomized controlled trial of a faculty development course designed to improve evaluation skills in 2001–2002.16 Participating faculty from each institution were chosen by the residency program director. Five university-based programs (e.g., situated at a medical school’s primary teaching hospital) and 11 university-affiliated community-based programs (e.g., not situated at a medical school and not the medical school’s primary teaching hospital) participated. Program directors were encouraged to select faculty who did or could play a significant role in the program’s educational and assessment activities; program directors were also encouraged to participate. Participants were informed that they would be assigned to a control or intervention group to test a faculty development intervention and that they would complete a comprehensive baseline assessment, including the rating of a series of videotaped clinical encounters. The study was approved by the Yale University human investigation committee and the Uniformed Services University of the Health Sciences institutional review board.

Assessment of clinical skills

Before participating in the faculty development sessions, all faculty observed and rated a series of nine videotaped clinical encounters that were presented in random order.

The history skill videotapes depict a Caucasian male resident evaluating a 64-year-old African American woman presenting to the emergency room with acute shortness of breath and chest pain attributable to a pulmonary embolism.

Regarding the different history skill videotapes, the “poor” performance encounter shows the resident conduct a very doctor-centered interview with all closed-ended questions. A number of key feature history items are also neglected by the resident. The “best” performance encounter shows the use of open-ended questions and coverage of key feature questions that lead to a proper diagnosis.

Faculty rated resident performance on the tapes using a modified version of the ABIMs nine-point mini-clinical evaluation exercise form.17–19 On this form, a score of one to three denotes unsatisfactory performance, four to six denotes satisfactory performance, and seven to nine denotes superior performance. Videotape scripts were written for SPs and standardized residents (SRs) to depict three levels of performance for each of three clinical skills: history taking, physical examination, and counseling. The same SR and patient portrayed the three levels of performance for each of the three clinical skills. None of the tapes were designed to be a “gold standard”; some deficiencies were depicted on each tape.

Statistical analysis

The mean differences in ratings between male and female faculty were first examined in an unadjusted analysis for each videotape, with P < .05 considered statistically significant. A random-effects regression analysis was then performed adjusting for faculty age, years in current job, general internist, or subspecialist. The male attending served as the reference. An effect coefficient >0 denotes that female faculty were more likely to give a higher rating, and a value <0 denotes that female faculty were more likely to give a lower rating. All analyses were performed with SAS (SAS Institute, Inc., Cary, North Carolina).

Results

The demographic characteristics of faculty participants are shown in Table 1. The majority of the faculty were general internists, and nearly half (48%) were women. A total of 348 tapes were rated by the 40 faculty. Some faculty arrived late to the baseline assessment; a total of 12 faculty–videotape encounters (3%) were not rated. Table 2 displays the unadjusted mean rating score for each videotape encounter according to the gender of the faculty rater. Level one videotapes were scripted to portray the lowest level of performance for the depicted clinical skill, and level three videotapes were scripted to depict the highest level of performance for the depicted clinical skill. This unadjusted analysis shows no statistically significant differences between the male and female faculty ratings for each of the clinical skills, regardless of the gender of the trainee on the videotape.

Table 3 displays the results of the random-effects regression model that adjusts for the faculty characteristics of age, program director status, length of time in job, and specialty. Using this analysis, compared with their male colleagues, female faculty rated the physical exam skills of the scripted male resident slightly lower, the interviewing skills of a different scripted male resident slightly higher, and the counseling skills of the scripted female resident slightly higher. None of these differences, however, were statistically significant.

Discussion

Our data demonstrate—in a controlled research setting investigating the observation of discrete, measurable behaviors—that there are no significant differences in the rating of clinical skills via direct observation based on the gender of the faculty or the resident. To our knowledge, this is one of the first studies to directly study the association of gender with the evaluation of clinical skills. Almost all previous studies on faculty evaluations had only examined gender as it was associated with global competency ratings, not specific, targeted competencies such as interviewing, physical examination, and counseling skills.3,8,11–13

How do these observations compare with the real-world setting of residency training? And are our findings consistent across specialties? Our cohort consisted of relatively younger clinician–educators who trained in an era of increasing enrollment of women into medical school and internal medicine residencies.20 This group of faculty, perhaps more accustomed to working with women colleagues in their formative educational years, likely had different experiences and possibly less bias than would colleagues who trained during a time or in specialties in which direct interaction with female physicians was more limited. Also, the task of rating specific clinical skills and behaviors, rather than providing a global rating of perceived competence not grounded in direct observation, may help to mitigate any potential effects on trainee gender bias seen in past studies with global ratings. Past studies have shown that faculty based their global ratings mostly on the personality and perceived medical knowledge of the resident and not on explicit criteria or directly observed clinical skills.21,22 Furthermore, in the videotapes, the faculty and trainees did not have a personal relationship. This may have substantially mitigated the effect of the “personality factor” seen in previous studies of global rating scales.21,22

Our finding of a lack of an educationally significant difference in ratings based on gender when directly observing residents perform clinical skills is encouraging, especially given that the videotapes were scripted to depict varying levels of performance based on the quality of the clinical skills and not gender-specific traits. Given the growing concern over the state of trainees’ clinical skills, there is an urgent need for more direct observation by faculty, and with women now constituting approximately 50% of the medical school enrollment, recognizing and addressing gender bias in the evaluation of clinical competence is important.20–23 This study, along with two earlier single-institution studies, suggests that gender bias in evaluation may be lessening.8,13 However, future studies on the potential association of gender and the evaluation of clinical skills now need to move into the training setting.

Several limitations should be noted. Faculty participants were either program directors or selected by their program directors; the majority were relatively early in their careers, and most were general internists. Our results may not generalize to all faculty, particularly older faculty or subspecialty physicians. Also, we do not know whether the rating behaviors seen in this study reflect what faculty actually do when rating real residents in their own programs. Characteristics known to be different, such as female physicians’ tendency to perform longer visits and be more conversational, could have a bigger effect in the observation of trainees in actual clinical encounters.16 Finally, we did not assess the effects of ethnicity on evaluation outcomes, another important area for study.

Conclusion

To our knowledge, this is one of the few studies to investigate the potential effect of gender on the rating of trainees’ clinical skills. In a controlled setting, in which specific skills were evaluated via direct observation by faculty, we did not find any evidence of gender bias. Given the refocus on clinical skills in medical education, we need a fuller understanding of the factors that affect the quality and accuracy of evaluation based on direct observation.24,25 Future studies should examine whether gender and ethnicity bias in the ratings of trainee performance occur when working with actual patients.

23 Association of American Medical Colleges. Women in U.S. Academic Medicine Statistics and Medical School Benchmarking 2004–2005. Washington, DC: Association of American Medical Colleges; October 2005.