Introduction

We explored whether clinicians are overconfident in their judgments about the effectiveness of risk reduction measures in women with mutations in the BRCA1 gene. In this context, "overconfidence" is defined as the expression of too much certainty in subjective estimates, regardless of whether estimates are large or small.

Methods

We asked physicians to estimate the percent decrease in the lifetime probability of breast and ovarian cancer in carriers who received various prophylactic interventions. Respondents were also asked to indicate their 90% plausibility interval. Subjects were breast cancer clinicians and principal investigators on NCI-sponsored Specialized Programs in Oncology Research and Education (SPOREs) in breast cancer at six US cancer centers.

Results

Clinicians varied widely in their estimates of effectiveness. Many had plausibility intervals that did not include the best estimate offered by other clinicians. It was not uncommon to find two clinicians with plausibility intervals that did not overlap. In addition, many clinicians expressed 90%-plausibility intervals that were so narrow that they did not capture findings from large robust studies of the effectiveness of prophylaxis. While, by definition, 10% of clinicians should have been surprised to learn that a scientific finding was outside their 90% plausibility interval, we found that 34-67% would have been surprised. This is because their plausibility intervals were too narrow.

Conclusion

We found that clinicians are overconfident in their estimates of the effectiveness of BRCA1 risk-reduction measures.

The psychological literature on probability judgment is replete with studies demonstrating that people express high levels of confidence in their fallible judgments [1]. For example, in a classic study outside the medical context, Alpert and Raiffa [2] asked Harvard Business School students to estimate obscure quantities such as the percent of their classmates who preferred bourbon to scotch, the total egg production in the US, and the 1967 toll collection of the Panama Canal in millions of dollars. Naturally, students were uncertain about these facts. The authors also asked them to specify a lower bound estimate and an upper bound estimate such that they were 98% sure that the true value was between these two extremes. If students had specified intervals that were sufficiently wide given their uncertainty, then 98% of them should have captured the true value and 2% should have been surprised upon learning that the true value was outside their interval. The authors found, however, that 42% failed to capture the true value. This is because the students offered intervals that were too narrow, an indication of overconfidence.

Someone with an appropriate level of confidence should be able to express 98% confidence intervals such that the true value tends to lie outside their intervals 2% of the time. Similarly, they should be able to offer 90% intervals such that the true value lies outside 10% of the time, or 80% confidence intervals such that the true value lies outside 20% of the time. If the true estimate lies outside the specified interval too often, that's evidence of overconfidence. If it lies inside too often, that's evidence of underconfidence.

Note that the degree of knowledge that a person has is not necessarily related to the appropriateness of their confidence in their knowledge. For example, someone with very little knowledge of a given fact can express an appropriate level of confidence (or lack thereof) simply by offering confidence intervals that are wide. Conversely, someone with considerable knowledge of the same fact might actually have an inappropriate level of confidence because they offer confidence intervals that are not sufficiently narrow given their expertise.

If physicians are inappropriately confident about their medical knowledge this might have important consequences for patient welfare. An overconfident physician might dissuade patients from seeking a second opinion, or convince a risk-averse patient that the likelihood of an adverse outcome is more remote than the facts would suggest. In addition, an overconfident physician may be less likely to seek, perceive, and assimilate new information.

We sought to assess whether clinicians were overconfident in their estimates of the effectiveness of prophylactic measures for risk reduction in women with a mutation in the BRCA1 gene. That is, we did not seek to assess what clinicians know, but whether they "know what they know."

Subjects were 18 breast oncologists and principal investigators on NCI-sponsored Specialized Programs in Oncology Research and Education (SPOREs) in breast cancer at six US cancer centers. Each completed a one-page questionnaire.

We asked clinicians to consider several interventions: bilateral prophylactic mastectomy, oophorectomy with and without estrogen replacement, tamoxifen, and certain combinations. For each prophylactic measure, they indicated the "percent decrease in the lifetime probability of developing breast cancer among 30 year old women with a BRCA1 mutation (who have completed their childbearing)." In addition to their best estimate, we asked clinicians to give their 90% plausibility interval by indicating a "lower bound estimate such that there is only a 0.05 probability that the true value is lower, and ... upper bound estimate such that there is only a 0.05 probability that the true value is higher." For each intervention we also asked them to specify their best estimate and range for ovarian cancer risk reduction.

Figure 1 depicts each clinician's best estimate and plausibility range for the breast cancer risk reduction offered by prophylactic mastectomy. Baseline estimates vary considerably, ranging from 45% to 98%. There was also considerable variation in the degree of expressed uncertainty. Clinician 2, for example, felt that mastectomy in carriers probably reduced risk by 85%, but might reduce risk by as little as 10% or as much as 90%. Clinician 18 expressed greater confidence, saying that mastectomy would reduce risk by 98% and risk reduction was unlikely to be lower than 97% or higher than 99%. It is noteworthy that each of these two clinician's plausibility intervals do not contain the best estimate offered by the other, and in fact their plausibility intervals do not even overlap - yet, by definition, each is 90% confident that the true estimate lies within their specified range. Figures 2 shows clinicians' estimates of the impact of oophorectomy and Figure 3 shows the impact of tamoxifen on breast cancer.

We found that clinicians expressed 90% plausibility intervals that were so narrow that there was no risk reduction estimate captured by 90% of subjects, suggesting overconfidence. We also contrasted clinicians' estimates with results from large recent robust studies: In a retrospective cohort analysis, for example, Hartmann et al [3] found that prophylactic bilateral mastectomy reduced the risk of breast cancer by an estimated 93% in a cohort of 287 women with a strong family history of breast and/or ovarian cancer. While we expected 90% of clinicians to capture this estimate in their 90% plausibility interval, we instead found that only 39% of clinicians captured Hartmann's result. As another example, Rebbeck et al [4] examined 63 women with BRCA1 mutations and found that those who underwent prophylactic oophorectomy before age 50 and did not receive hormone replacement therapy had a 54% lower risk of breast cancer relative to those who did not receive oophorectomy. We again expected 90% of clinicians to capture this estimate in their 90% plausibility interval, but found that only 44% of clinicians did so. Finally, the NCI National Surgical Adjuvant Breast and Bowel Project (NSABP) reported that tamoxifen reduced the incidence of breast cancer in high-risk women by 49% [5]. Only 67% of clinicians captured this estimate in their 90% intervals.

Our results reveal clear evidence of overconfidence. There was no estimate of effectiveness such that 90% of subjects captured that estimate in their 90% plausibility intervals. Clinicians systematically gave plausibility intervals that were too narrow. This was true for all interventions and both breast and ovarian cancer.

In this study we did not seek to judge the knowledge of clinicians or the accuracy of their estimates. An oncologist might be less knowledgeable about mastectomy and more knowledgeable about tamoxifen, for example, while a surgeon would have the opposite expertise. This is to be expected. However, even a physician who is very uncertain could exhibit a level of confidence appropriate to their (lack of) knowledge simply by specifying a wide plausibility interval. In this study we sought to assess the appropriateness of their level of confidence in their estimates. Our results offer clear evidence that in this instance clinicians are systematically overconfident. A well-calibrated clinician should be able to delineate 90% plausibility intervals that contain the true estimate roughly 90% of the time. We found, however, that clinicians captured the results from rigorous studies only 39-67% of the time.

This research shows that even some of the most respected breast cancer clinicians in the United States hold conflicting opinions and are, in general, overconfident in those opinions. This lack of consensus combined with high levels of confidence may be a source of great confusion to women seeking advice on prophylaxis from two or more physicians. Physician overconfidence in the effectiveness of preventive measures serves to compound the difficulty of the decisions surrounding genetic testing.

Acknowledgements

This work was funded in part by the National Cancer Institute through a Specialized Program of Research Excellence (SPORE) grant in Breast Cancer at Duke University, P50 CA68438. We gratefully acknowledge the assistance of those breast cancer clinicians affiliated with NCI-sponsored SPOREs around the country who anonymously provided us with subjective cancer risk reduction estimates.