This article has a correction. Please see:

Abstract

Purpose: The balance of benefits and harms in prostate cancer screening has not been sufficiently characterized. We related indicators of mortality reduction and overdetection by center within the European Randomized Study of Prostate Cancer Screening (ERSPC).

Experimental Design: We analyzed the absolute mortality reduction expressed as number needed to invite (NNI = 1/absolute risk reduction; indicating how many men had to be randomized to screening arm to avert a prostate cancer death) for screening and the absolute excess of prostate cancer detection as number needed for overdetection (NNO = 1/absolute excess incidence; indicating the number of men invited per additional prostate cancer case), and compared their relationship across the seven ERSPC centers.

Translational Relevance

Our findings indicate that in a population-based screening program, 700 to 2,000 men need to be invited for screening to avert one prostate cancer death during 13 years, depending on the screening algorithm and background risk in the population. The excess numbers of cases detected by screening are considerably larger, with one additional diagnosis per each 16 to 69 invited men. If the ratio of averted prostate cancer deaths and overdiagnosed cases (the pros and cons) is largely constant, more aggressive screening is justified mainly for high-risk groups that are likely to obtain the largest benefits (very likely men with little comorbidity, those with family history) and those who do not emphasize the potential adverse effects of overdiagnosis (men who can cope with treatment-related side effects or active surveillance as treatment strategy).

Introduction

The major adverse effect of prostate cancer screening is overdiagnosis, that is, detection of cases that would not have been diagnosed during a man's lifetime in the absence of screening. The extent of overdiagnosis has been estimated as 25% to 50% of the screen-detected cases (2, 3). Overdiagnosis is due to both substantial lead time gained by screening and increased detection of slow-growing or even nonprogressive disease. Overdiagnosis can rarely be recognized at individual level at the time of diagnosis based on pathologic and clinical presentation alone, as it also depends on a man's remaining lifespan. Therefore, the full extent of overdiagnosis can generally be shown only through comparison of screened and unscreened populations. However, modeling studies have suggested that if the long-term benefits of screening are predicted on the basis of ERSPC, they would eventually outweigh the adverse effects in terms of quality-adjusted life-years (4).

Absolute measures of effect (such as risk or rate difference) provide more concrete indication of intervention effect, as they reflect the probability of the benefit or adverse effect, that is, the increase or decrease in frequency of occurrence (and depend usually also on the level of background risk). In contrast, relative measures (for instance rate ratio or risk ratio) do not capture this aspect and cannot be interpreted as probabilities of benefit and harm (unless combined with the underlying risk, as they express the effect as a multiple of the baseline risk). Because of the more tangible information provided by absolute effect measures, reporting guidelines for randomized trials recommend showing both types of results (5, 6). Number needed to treat is the absolute measure used for treatment trials and number needed to screen the equivalent metric for screening trials (7, 8). To factor in incomplete compliance in screening trials (particularly population-based effectiveness trials), number needed to invite (NNI) has been developed as a further indicator of absolute screening effect (9). Absolute measures of effect depend, however, not only on screening procedures but also on the duration of follow-up, background risk in study population (e.g., age distribution) and different studies are therefore not directly comparable.

Within the ERSPC, there are differences in both study populations (e.g., underlying prostate cancer incidence and mortality) as well as screening procedures (despite common core protocol; e.g., in number of screening rounds). Here, we describe the differences in the absolute effect of screening, with number of men needed to invite to avoid one prostate cancer death (NNI) as an indicator of the screening benefit. For harms, we used absolute risk of overdiagnosis expressed as number needed for overdiagnosis (NNO), calculated as the inverse of the excess risk (cumulative incidence) of prostate cancers in the screening versus control arm. The balance of the benefits and harms (NNI/NNO ratio) is expressed as the number of excess cases needed to detect for averting one prostate cancer death (number needed to detect, NND). The pooled ERSPC results on the basis of the same data were published recently (1), and the present article constitutes a secondary analysis.

Materials and Methods

ERSPC is a randomized multicenter trial, with the primary aim to assess the effect of prostate-specific antigen (PSA)–based screening on prostate cancer mortality. The trial was initiated in the mid-1990s (from 1991 to 1998) in seven centers (the Netherlands, Belgium, Sweden, Finland, Italy, Spain, and Switzerland). France joined later but was excluded from this analysis due to short follow-up.

Men in the intervention arm were invited to PSA-based screening and those in the control arm received no intervention (usual care, Fig. 1). Eligible subjects were identified from population registers and individually randomized on the basis of random numbers (with 1:1 allocation, except in Finland with a fixed size of the screening arm leading to an intervention/control ratio of ∼1:1.5). Consent was obtained prior to randomization in most countries (volunteer-based efficacy trial), but randomization before consent was used in Sweden, Finland, and Italy (population-based effectiveness trial). Recruitment was completed in all centers by the end of 2003. The screen interval was 4 years, except in Sweden, a 2-year interval was used. PSA determination with a cutoff of ≥3.0 ng/mL was the principal screening test (an ancillary test was used at 3.0–3.9 ng/mL in Finland). Screen-positive men were referred to prostatic biopsies. Screening was discontinued after 3 screening rounds in most centers but continued up to 5 times in the Netherlands and 10 rounds in Sweden. Institutional review board approvals were obtained in each center. The ERSPC trial is registered in Current Controlled Trials as ISRCTN49127736.

Data on deaths and emigrations were obtained from national or regional population registries. Prostate cancer cases detected outside screening (interval cases, cancers in nonparticipants and in the control arm) were identified through cancer registries. Prostate cancer deaths were ascertained in each center by causes of death committees that evaluated all deaths in men diagnosed with prostate cancer and/or prostate as a cause of death in the death certificate, with blinding to the arm of the trial and following a uniform algorithm developed for the trial (10). In Finland, death certificates were used after the initial period, on the basis of very high concordance between the committee assignments and the official causes of death (κ > 0.9; ref. 11).

The present analysis is limited to the core age group, 55 to 69 years, to improve comparability between the centers. For the same reason, follow-up was truncated at 13 years, even if data were available through 2010, potentially allowing longer follow-up in some centers. A mean length of follow-up of at least 13 years was available in all included centers, except only 10 years in Switzerland. The intention-to-screen principle was used in the analysis, with groups compared formed by random allocation (regardless of compliance).

NNI to avert one prostate cancer death was calculated as the inverse of the absolute risk difference in prostate cancer mortality between the arms [NNI = 1/(Mc − Ms)] to indicate the mortality reduction by screening. As an indicator of overdiagnosis, NNO was calculated as the inverse of the risk difference in cumulative incidence of prostate cancer between the trial arms [NNO = 1/(Is − Ic)], which represents the absolute risk of overdiagnosis (12). Finally, NND as a measure of the overall impact (benefits and harms) is shown as the ratio of the reduction in prostate cancer mortality to the excess prostate cancer incidence (NNI/NNO). In the calculation of number needed indicators, the values are always rounded upward (to the next integer). For their interpretation, it is essential to appreciate that a smaller value indicates a larger effect.

Linear regression with weighting by number of prostate cancer deaths was used to assess the relationship between NNI and NNO between the centers, as well as in the analysis of the relation between the characteristics of the ERSPC centers (baseline risk, number of screens, proportion of positive screening tests, and proportion of screen-positive men biopsied) and the impact measures (NNI, NNO, and NND). The assumption of normal distribution was assessed using Shapiro–Wilks test. Poisson regression analysis was conducted with the numbers of cases as the outcome and number of men as the offset.

Results

The size of the study population (within the core age group) ranged from in 2,197 in Spain to 80,379 in Finland (Table 2). The proportion of men in the screening arm attending at least once was more than 90% in the volunteer-based centers and approximately 75% in the population-based centers. The proportion of positive tests was lowest (∼11%) in Italy with a low prostate cancer incidence and in Finland, where an ancillary test (free/total PSA ratio with a cutoff 0.16) was used at PSA range 3.0 to 3.9. The proportion of positive tests was highest (>20%) in the Netherlands and Switzerland with various side studies involving additional tests.

Numbers of men, prostate cancers and prostate cancer deaths by trial arm and center in the ERSPC

The highest cumulative incidence at 13 years in the screening arm was seen in Sweden and the Netherlands (12.5%), followed by Switzerland (11.6%; Table 3). The cumulative incidence in the control arm was highest in Sweden (7.9%), followed by Finland and Belgium (7.5%). The excess incidence (difference in cumulative incidence between the arms) ranged from 1.5% in Italy to 6.3% in the Netherlands. NNO ranged from 16 to 22 in the Netherlands, Switzerland, and Sweden (indicating one excess case detected per ∼20 screened men) to 47 to 69 in Finland, Belgium, and Italy.

Cumulative incidence of prostate cancer by arm, with excess incidence and NNO by ERSPC center

When patients treated with active surveillance were excluded, the extent of overdetection was strongly reduced in Sweden (NNO increased from 22 to 41), Finland (from 51 to 137), and Switzerland (from 18 to 34), where active surveillance was commonly used, and also to some extent in Belgium (from 47 to 62), but the difference was not substantial in the other centers (in the Netherlands from 16 to 19, Italian and Spanish results almost unaltered because of infrequent use of active surveillance).

Prostate cancer mortality within the control arm was highest in Sweden (10 per 1,000 men) followed by the Netherlands, Finland, and Belgium (∼5–7 per 1,000; Table 4). Also in the screening arm, prostate cancer mortality was highest in Sweden, Finland, and the Netherlands (5–6 per 1,000). A reduction in prostate cancer mortality was achieved in all centers except Switzerland. The absolute reduction was largest in Sweden (4 per 1,000), followed by the Netherlands and Belgium (1–2 per 1,000). The mortality reduction relative to population size (NNI) ranged from 252 in Sweden to 1,821 in Finland (not defined for Switzerland).

Prostate cancer mortality by arm, with mortality reduction and NNI by ERSPC center

The amount of mortality reduction per number of invited men (mean NNI) pooled across the centers was 855 (SD, 576; excluding Switzerland) and the excess incidence relative to population size (mean NNO) was 36 (SD, 20; including Switzerland). The variability between centers was larger in mortality impact than excess detection (ratio of smallest to largest value, 7 for NNI vs. 4 NNO; SD, 67% of the mean for NNI vs. 55% for NNO). When the probability of benefit (absolute reduction in prostate cancer mortality, NNI) was related to the probability of harm (excess incidence, NNO), Sweden and the Netherlands showed both the largest absolute mortality reduction and the highest absolute excess incidence (Fig. 2). The other centers (Finland, Italy, Belgium, and Spain) showed both lower mortality reduction (higher NNI) and less excess incidence (larger NNO). In Switzerland, however, a high excess incidence was not balanced by mortality reduction. The proportion of screen-positive results was significantly associated with both mortality reduction (NNI) and excess detection (NNO; P = 0.001 and r2 > 0.8 for both) but other center characteristics such as median age, number of screening rounds, and baseline risk (prostate cancer incidence or mortality in the control arm) were not.

The relationship between absolute prostate cancer mortality reduction, expressed as number needed to invite (NNI = 1/risk difference) and absolute excess of cumulative prostate cancer incidence, expressed as number needed for overdetection (NNO = 1/risk difference) by center in the ERSPC.

The correlation coefficient for absolute mortality impact and absolute excess incidence (NNI and NNO) was 0.76 (excluding Switzerland) and linear regression coefficient (with weighting of centers by number of prostate cancer deaths) was 33 [95% confidence interval (CI), 5–62; P = 0.03 with R2 = 0.65], larger mortality reduction associated with larger overdetection, indicating intensive screening resulting in both large benefits and adverse effects. Adjustment for age in linear regression analysis did not weaken the association between mortality reduction and excess incidence (NNI and NNO). If Finland was excluded, the association was stronger (r = 0.95, linear regression r2 = 0.84, P = 0.02), whereas exclusion of the other major centers (Sweden or the Netherlands) did not substantially affect the result.

The ratio of mortality reduction to excess incidence (mean NND, calculated as NNI/NNO) for the 6 centers was 22, with a range of 12 to 36. It was smallest for Sweden (NND, 12), indicating that the number of excess cases per averted prostate cancer death was less than in other centers, and largest for Finland (NND, 36), with both smaller mortality reduction and excess incidence. Baseline risk (prostate cancer mortality in the control arm) showed little association with the excess incidence (NND), nor did number of screening rounds attended, proportion of positive screening tests, or proportion of screening-positive men biopsied (for all P > 0.2, r2 < 0.2 in linear regression).

Discussion

Our analysis of absolute measures of screening benefits (prostate cancer mortality reduction, NNI) and harms (excess prostate cancer incidence, NNO) shows that at 13 years of follow-up on average 12 to 36 excess prostate cancer have to be detected to avert one death from the disease. In comparison between the ERSPC centers, a direct correlation was observed between screening benefit and harm. This suggests that with the current screening regimens, any efforts to increase the effectiveness of screening are likely accompanied by unavoidable increase in the harmful effects. This was further supported by the association of the proportion of positive screening tests with both mortality reduction (NNI) and excess incidence (NNO). A likely explanation is the largely overlapping PSA distribution among fatal and overdiagnosed prostate cancer cases, that is, PSA does not allow a sharp distinction of lethal and inconsequential prostate cancer in the early preclinical phase of the disease due to a long lead time (detection several years earlier—and at a younger age—compared with clinical diagnosis) and length bias (increased detection of slowly growing cancers with a long detectable preclinical phase by screening).

Our results can be interpreted as suggesting a substantial overlap between cases that contribute to mortality reduction achievable by screening and those that would remain undetected in the absence of screening. This notion is supported by the fact that prediction of disease outcome in patients with low-risk features is highly uncertain and therefore distinguishing patients that would substantially benefit from treatment is challenging. This means that screen-detected cases include both tumors with indolent and progressive behavior, but we are currently unable to clearly discern the two. It is also consistent with trials showing only modest benefit from prostatectomy compared with expectant management in low-risk prostate cancer (13, 14) and very high cause-specific survival in patients treated with active surveillance (15), both indicating small advantage attainable by active treatment in men with a low-risk prostate cancer.

The wide variation between the ERSPC centers in the outcomes appears to reflect mainly differences in screening protocols, as risks and benefits were not clearly associated with the underlying risks between the populations. Yield of screening depends on the cutoff value for the screening test, number of screening rounds, and participation. These features are very likely to account for the differences in excess incidence and hence the extent of overdetection.

Of the various features, proportion of positive tests (reflecting PSA threshold and additional tests) was most closely associated with the absolute benefit and harm (NNI and NNO). There was less variation between the centers in the ratio of benefits and harms (NND, NNI/NNO) than in the mortality effect (3- vs. 7-fold). This also supports the interpretation that increased mortality reduction is linked to a higher rate of overdiagnosis. Sweden showed the most favorable overall impact (in terms of balance of mortality reduction to excess incidence, NND) and Finland the smallest, despite the fact that the two centers share similar population-based approaches, with comparable participation and the highest prostate cancer mortality (both nationally and in the control arm). Possible explanations include differences in screening intensity or contamination. The 3-fold benefit/harm ratio (NND) in Finland compared with Sweden may indicates that it is more strongly affected by mortality reduction (NNI) than excess incidence (NNO), as NNI showed a larger (7-fold difference) between the two countries. Likewise, substantial differences emerged between the volunteer-based efficacy trials, with both NNI and NNO at least twice as high for Belgium than the Netherlands. This could be attributable to the very long screening interval in Belgium (7 years).

Finding an optimal balance is probably not possible simply by maximizing only detection or minimizing overdiagnosis, but it requires consideration of several aspects of the screening program. The findings indicate that an aggressive screening protocol with a large proportion of screen-positive results is likely to increase both harms and benefits. However, the favorable overall balance of effects shown for Sweden (with an NND of 12) seems to depend mainly on the large absolute mortality reduction. One explanation could be that only an intensive or continuous screening can detect potentially lethal cancers at a curable stage. This likely leads also to increased diagnosis of indolent cases, whereas a less intense program yields fewer such cases but also a small mortality reduction.

The largest mortality reduction (smallest NNI) was seen in Sweden and the Netherlands. As they had the largest number of screening rounds, as well as the longest duration of screening (up to 20 years), this finding suggests that 3 screening rounds used in the other centers may not be sufficient to provide a similar mortality benefit. The lack of a mortality reduction in Switzerland may reflect shorter follow-up, as the screening effect emerges only after 7 to 9 years of follow-up (16), but may also be chance fluctuation (the difference being due to only two additional deaths). The absolute mortality impact is likely to increase with additional follow-up especially in centers continuing screening, owing to the slow progression of the disease and substantial proportion of deaths occurring only 10 to 15 years or more after diagnosis. The extent of overdetection may on the hand decreases, as the catch-up in the control arm is likely to continue for the duration of the lead time gained in the screening arm (after cessation of screening).

The results do not necessarily indicate which screening protocol is superior, because the harms and benefits are not commensurate. Most would agree that an averted prostate cancer death carries a higher impact than an overdiagnosed case. However, assessment of their relative weight is not straight forward, as it reflects value judgment (and various preferences would likely yield contrasting evaluations of the optimal balance). One approach to improving the comparability between the different endpoints is use of quality-adjusted or disability-adjusted life-years (QALYs and DALYs). We have previously modeled the long-term impact of screening and performed a cost-effectiveness analysis using QALYs as indicator of the overall effects of screening (4, 17).

For reduction of overdiagnosis, multiparametric MRI in guiding prostate biopsies has shown promise in detecting primarily clinically significant prostate cancer and reducing the yield of insignificant cancers (18). Novel, powerful prognostic markers would also allow identification of cases that would not require treatment and hence reduce overtreatment. Screening is likely to provide largest absolute benefit in the subgroups with the highest risk, and correspondingly, the balance of benefit and harm is likely disadvantageous for low-risk men. Multifactorial risk stratification may allow identifying a target population with a superior screening effect.

There are some caveats in the interpretation of the results. First, the number of observations is small, as each center as a unit of observation results in a material of only 6 or 7 data points (depending on inclusion or exclusion of Switzerland and Spain). Second, all excess incidence does not necessarily constitute overdiagnosis but may also reflect lead time (earlier diagnosis due to screening) with subsequent catch-up by the control arm, as both absolute and relative excess incidence have decreased with the duration of follow-up (1). Furthermore, a high prostate cancer incidence in the control arm may reflect contamination (opportunistic screening), and the low excess incidence in such context may be regarded as spurious. This is also one possible explanation for the small mortality benefit and modest excess incidence in Finland. In addition, cause of death attribution is prone to errors, which may affect the outcomes (11, 19). Overall mortality is less prone to bias, but prostate cancer is an uncommon cause of death, which precludes a reduction in overall mortality and limits the gain in life-years, as for most other screening programs (20, 21). This does not, however, preclude any impact on life-years, as modeling studies have shown up to 100 life-years saved per 1,000 screened men (17).

In conclusion, the results indicate a strong interrelation between benefits and harms of prostate cancer screening. Decision-making about prostate cancer screening needs to involve judgment of the importance of averted prostate cancer death relative to unnecessary diagnosis and harms of treatment, to gauge the tradeoffs between benefits and harms. With intensive screening, mortality reduction achievable may be larger, but the optimal balance between benefits and harms depends on weighting and valuation of the gains versus adverse effects of screening.

Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

Disclaimer

The authors are solely responsible for the study design, collection, analysis, and interpretation of the data, writing the report and decision to submit for publication.

Other (functioned as international ERSPC co-coordinator): F.H. Schröder

Other (interpretation of the data and critical revision of the article for important intellectual content): S. Carlsson

Grant Support

The coordination of the ERSPC trial has been financially supported in part by the 6th Framework programme of the EU; European Commission Europe against Cancer programme; and an unconditional grant from Beckmann-Coulter-Hybritech. The Finnish center has been financially supported by grants from the Academy of Finland (grant #260 931); Cancer Society of Finland; Pirkanmaa Hospital District Competitive Research Funding, and Nordic Cancer Union; the Dutch Center by the Dutch Cancer Society, and the Netherlands Organisation for Health Research and Development (ZonMW); Sweden by Abbott Pharmaceuticals; Af Jochnick's Foundation; Catarina and Sven Hagstroms Family Foundation, Gunvor and Ivan Svensson's Foundation; Johanniterorden; King Gustav V Jubilée Clinic Cancer Research Foundation; Sahlgrenska University Hospital; Schering Plough; Swedish Cancer Society; and Wallac Oy; Belgium by Flemish Ministry of Welfare, Public Health and Family; Province and City of Antwerp; Public Centre for Social Welfare Antwerp; Italy by Italian League for the Fight against Cancer (LILT); Italian Association for Cancer Research (AIRC); and National Research Council (CNR), Tuscany Region; Switzerland by The Horten Foundation; Aargau Cancer League; Swiss Cancer League; Health Department of Canton Aargau; Prostate Cancer Research Foundation; Baugarten Foundation; and the Messerli Foundation; and Spain by The Spanish “Fondo de Investigación Sanitaria.”

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.