This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 Unported License http://creativecommons.org/licenses/by-nc/4.0/, which permits unrestricted non-commercial reuse, provided the original author and source are credited.

Conclusion:

The low variability of the DIPL volume on T2W MRI between Observers and agreement with histology indicates its suitability for delineation of gross tumour volume for radiotherapy planning. The volume of cellular tumour represented by DW-MRI is greater than the vascular (DCE) abnormality; ratios of both to T2W volume are independent of Gleason score.

Advances in knowledge:

(1) Manual volume measurement of tumour is reproducible within 1cm3 between observers on all sequences, confirming suitability across observers for radiotherapy planning. (2) Volumes derived on T2W MRI most accurately represent in vivo lesion volumes. (3) The proportion of cellular (DW-MRI) or vascular (DCE-MRI) volume to morphological (T2W MRI) volume is not affected by Gleason score.

INTRODUCTION

The soft-tissue contrast on T2 weighted (T2W) MRI is preferred over X-ray CT for prostate tumour identification, staging1–4 and defining the dominant intraprostatic lesion (DIPL).5 Furthermore, additional information available from diffusion-weighted (DW)-MRI and dynamic contrast enhanced (DCE)-MRI techniques, collectively termed multiparametric (mp)MRI, may be exploited to improve sensitivity and specificity for tumour identification over T2W imaging alone.6 An accurate definition of gross tumour volume (GTV) derived from these images is essential in planning radiation therapy,7 particularly when giving boost doses to the DIPL:8 overestimation of the GTV increases the risk of radiation-induced complications to organs at risk such as the rectal wall, and underestimation reduces the long-term efficacy of treatment.9 However, as there is increasing evidence that the volumes defined on individual mpMRI sequences are significantly different from each other10 and depend on underlying histology,11,12 the optimal sequence on which to outline the GTV remains to be established.

Traditionally, tumour outlines are carried out on T2W images for radiation therapy planning. Although this involves simultaneous viewing of all mpMR images,13 the specific and independent influence of the DW-MRI- and DCE-MRI-identified tumour on the morphological (T2W) outlines, which may vary with Gleason grade, has not been documented. A recent large study showed that the maximum volume measured on mpMRI correlated best with histology.14 The purpose of this study therefore was to establish the interobserver reproducibility of prostate tumour volumetry on individual sequences obtained from mpMRI, validate the measurements against histology and determine whether the proportion of cellular (DW-MRI) or vascular (DCE-MRI) volume to morphological (T2W MRI) volume reflects the Gleason score.

METHODS AND MATERIALS

Patients

Imaging data were obtained from 41 males with prostate cancer (mean age 66.7±7.6 years, prostate-specific antigen range 3.0–32.0ngml−1, clinical grade T1–T3, Gleason grade 6–8) who had been enrolled consecutively in 2 unrelated prospective studies approved by the local institutional review board and had given written consent for use of their data. Acquired images were therefore analyzed retrospectively. All patients had mpMRI with positive histology on a standardized 8–10 core randomly sampled transrectal ultrasound-guided biopsy performed between 4 and 12 weeks previously (median 85 days, range 8–231 days). All patients were treatment naïve at the time of scanning. The first 20 patients (Cohort 1) were treated with radical prostatectomy and the latter 21 patients (Cohort 2) underwent radiation therapy with dose boosting to the DIPL. In Cohort 1, mpMRI was performed a mean of 16.7 days (median 12 days, range 1–54 days) prior to prostatectomy. In Cohort 1, 3 patients were Gleason grade 3+3, 12 patients were 3+4 and 5 patients were 4+3 or greater. In Cohort 2, 5 patients were Gleason grade 3+3, 10 patients were 3+4 and 6 patients were 4+3 or greater.

Image acquisition

All imaging was performed with an endorectal coil. Cohort 1 was studied at 1.5T and 55ml of room air was used for inflation of the balloon. Cohort 2 was studied at 3.0T and the balloon was filled with 60ml of perfluorocarbon to reduce susceptibility artefact. Hyoscine butyl bromide 20mg was administered intramuscularly in all cases. T2W images were obtained in three planes orthogonal to the prostate at both field strengths supplemented by position-matched DW- and DCE-MRI sequences in the axial plane.

An external pelvic phased-array coil was used to acquire axial T1 weighted and T2W images through the pelvis to assess lymph node status as part of the routine clinical examination at both 1.5T and 3.0T, but these images did not form part of the evaluation in this study.

Image analysis

Anonymized images were analyzed on dedicated reporting workstations. Axial T2W images, isotropic apparent diffusion coefficient (ADC) maps calculated from monoexponential fit of the DW data from all b-values and greyscale DCE images at peak contrast enhancement (range 58.5–62.5s) had manual regions of interest (ROIs) drawn around the DIPL on a two-dimensional slice-by-slice basis. The DIPL was defined as the largest visible low-signal intensity lesion on T2W images with a corresponding subjective ADC reduction on DW-MRI from an octant with a positive biopsy. In Cohort 1, the location of the DIPL was subsequently confirmed on the prostatectomy specimen. Smaller secondary lesions were ignored as these were not targets for dose boosting. Outlining was performed by free-hand drawing using a mouse-controlled cursor; margin recognition was based on the subjective assessment of the imaging features for each sequence according to current European Society of UroRadiology mpMRI guidelines.15 T2W images were assessed for regions of well-defined low T2W signal, ADC maps for regions of restricted diffusion and DCE sequences for regions of brisk contrast uptake and early washout (Figure 1). ROI delineation on each sequence was performed separately on a different occasion at least a week later to minimize possible memorization of tumour margins.

GTVs were calculated by multiplying stacked ROI areas generated by the workstation software by sequence-specific slice thickness. A radiologist with 4 years' prostate mpMRI experience performed all the DIPL ROI assessments. In addition, a second observer with 20 years' prostate MRI experience, blinded to the first observer ROIs, repeated identical assessments. Both observers were also blinded to the histopathological data.

Histopathological analysis

All patients in Cohort 1 underwent prostatectomy. The prostate was sectioned at 4-mm intervals in a plane perpendicular to the gland's posterior surface using a specially devised slicer to ensure accuracy of slicing.16 Formalin-fixed and paraffin wax-embedded whole-mount histopathological slides were prepared. The slicing axis matched the axial image acquisition angle so that stained sections from the embedded slices matched the imaging slices closely. Although the slice thickness did not match the imaging slice thickness, the segmentation of the whole tumour volume on both imaging and histology meant that slice-by-slice correlation of imaging with histology was not required. Tumour volumes of the DIPL were demarcated by a specialist histopathologist (Figure 1). The whole-mount slides were subsequently overlaid with a 1×1-mm translucent grid sheet and photographed over a light source. Histopathological tumour volumes of DIPLs were calculated by manual counting of overlying 1-mm2 grid squares, multiplied by the histological slice thickness.

Statistical analysis

Differences between the two observers for each sequence and histology were assessed using Bland–Altman plots and limits of agreement. The agreement was also assessed with a Pearson's correlation coefficient.

Analysis of variance (ANOVA) was used to assess intersequence volume differences as well as differences in relative volumes between sequences across the three Gleason grade categories (3+3, 3+4, and 4+3 or greater). Paired t-tests were used to detect significant differences between mpMRI volumes and histology for both observers.

A p-value of <0.05 was taken to be significant in all statistical tests. Analysis was performed in Microsoft Excel® (Microsoft, Redmond, WA) and SPSS® v. 23 (IBM Corp., New York, NY; formerly SPSS Inc., Chicago, IL). Bland–Altman plots were produced in GraphPad Prism® v. 6.07 (GraphPad Software, Inc., La Jolla, CA).

RESULTS

One patient in Cohort 2 had no DCE-MRI data and had artefacted T2W data and was excluded from subsequent analysis.

Tumour volume variability

Differences between observers for T2W, DW-MRI and DCE-MRI sequences

GTVs drawn on T2W images for both cohorts ranged from 0 to 7.0cm3 (mean 2.4±1.93cm3) for Observer 1 and from 0 to 7.2cm3 (mean 2.29±1.93cm3) for Observer 2. Corresponding data for DW- and DCE-MRI are given in Table 1. Differences between volumes derived from all three sequences were significant for both observers (ANOVA, p<10−8). Tumour volumes were smaller in Cohort 1 than that in Cohort 2 (Cohort 1: 1.5±1.6cm3 for Observer 1 vs 1.8±1.9cm3 for Observer 2 and Cohort 2: 3.1±2.0cm3 for Observer 1 and 3.0±1.9cm3 for Observer 2).

Histological volumes ranged from 0.04 to 4.72cm3 (mean 1.46±1.50cm3). One patient’s tumour was not detected on any mpMRI sequence but had a volume of 0.04cm3 on histology. T2W-MRI GTVs overestimated histological volumes by 33±76% (Observer 1) and 16±67% (Observer 2) but had the highest correlation coefficient (r=0.97 Observer 1, 0.93 Observer 2, p<0.0001). DW-MRI and DCE-MRI tended to underestimate histological volume (Table 2). Paired t-tests found that mean DCE-MRI GTVs were consistently and significantly different from histology (p=0.001 Observer 1 and 0.0003 Observer 2), whereas T2-W GTV differed from histology in Observer 1 only (p=0.005) and DW-MRI differed from histology in Observer 2 only (p=0.006). Bland-Altman plots for each sequence against histology with Limits of Agreement are exemplified for Observer 1 in Figure 3.

Average volumes from all three sequences in Cohort 1 were 1.33±1.46cm3 for Observer 1 and 1.05±1.23cm3 for Observer 2 (Table 2). A paired t-test showed no difference between this average volume and histology for Observer 1 (p=0.2), although differences for Observer 2 were significant (p=0.004).

As assessed by cognitive fusion, there was a >90% overlap between ROIs from each sequence with each other. DW-MRI- and DCE-MRI-derived volumes were consistently smaller than T2W volumes: DW-MRI to T2W MRI ratios were 73.9±18.1% for Observer 1 and 72.5±21.9% for Observer 2. DCE-MRI to T2W MRI volume ratios were even lower (42.6±24.6% for Observer 1, 34.3±24.9% for Observer 2). The proportion of the T2W volume represented by the DW-MRI volume and the DCE-MRI volume was significantly different (ANOVA, p<10−8).

Gleason grade was determined at prostatectomy in Cohort 1 and pre-treatment in Cohort 2. 8 patients had Gleason score 3+3 tumours, 21 patients had Gleason score 3+4 and 11 patients had Gleason score 4+3 or greater. DW-MRI to T2W volume ratios in the three Gleason categories were 75.4±15.9%, 72.6± 18.9% and 75.2±19.4%, respectively, for Observer 1 and 68.9±17.8%, 67.7±19.3% and 84.2±26.1% for Observer 2. DCE-MRI to T2W volume ratios in the three Gleason categories were 44.9±12.2%, 39.9±27.4% and 45.7±27.0%, respectively, for Observer 1 and 33.5±23.4%, 33.8±24.8% and 35.8±28.1% for Observer 2. There were no significant differences in DW-MRI and DCE-MRI to T2W MRI tumour volume ratios between the three Gleason grade categories for either observer (ANOVA, p>0.05), indicating no differences in the functional to morphological volumes with Gleason grade.

DISCUSSION

We have established that manual ROI delineation of DIPL is reproducible for the purposes of radiotherapy planning between two observers with approximately 1-cm3 limits of agreement on T2W MRI. Both observers interpreted the mpMRI in accordance with ESUR guidelines. In addition, they outlined in optimal ambient conditions for their individual preferences and had the ability to manipulate the window, brightness and magnification for decisions regarding tumour margins. Differences in observer perception of feature boundaries are likely to reflect the consistently lower measurements of Observer 2. Although differences in lesion conspicuity due to different sequences and scanners will also be a factor, this study aimed to establish variability between observers in the presence of these variations. Furthermore, as the DW image provided the most definitive contrast for lesion identification, its independence of field strength reinforces the validity of the findings across the two field strengths used in this study. The concordance of each observer's measurements with histology, however, remains the definitive test of the validity of the method. In practical terms, the measured tumour volume differed between observers by 1cm3 for T2W images in the largest tumours in our cohort, which should not cause differences in radiation therapy plans made on images outlined by different observers because of the addition of substantial additional margins when delineating a clinical target volume around the GTV.

We have demonstrated a correlation between mpMRI-derived tumour volumes with histology that is similar to others.10,14,17–19 In addition, we have shown that DIPL tumour volumes defined on the T2W images were consistently larger than those on DW- or DCE-MRI. Although they overestimate histological volumes, they are best suited to delineating the margins of the DIPL for radiation dose boosting, especially as a post-resection shrinkage factor of up to 1.15 in histological samples17 must be allowed for. Shrinkage is due to formalin fixation and was unavoidable in our study, as the tissue was preserved immediately post-resection for optimal diagnostic purposes. In an early work, Ponchetti et al19 showed that T2W images overestimated small tumours by as much as 58%; however, their MRI scans were performed post-biopsy, which may have confounded their MRI measurements. In comparison, a study by Cornud et al20 underestimated histology in nearly half the cases (49%) with a larger mean difference (−0.56cm3) than we demonstrated (−0.08 to 0.30cm3). A recent large study measuring all visible lesions in 202 patients also concluded that all sequences underestimated true volume and that the maximum volume from all sequences most closely matched histological volume. These results and those of others21,22 are likely to be influenced by the non-recognition on mpMRI of small, low Gleason score disease.

Estimation of tumour size has also been carried out on T2W imaging using a maximal dimension approach utilizing visual assessment of functional parameters to support the T2W measurements.23 Although these data correlate well with histological volumes, they also have been noted to underestimate them.24 Other studies have used the functional information to define the T2W ROIs, but have not interrogated the sequences individually.12 Where individual sequences have been investigated, e.g. DCE-MRI comparison with histology,25 the focus has been on technical developments and comparison with histology, rather than on investigating the relationship of volumetry derived from individual sequences. The only other study comparing intersequence differences reported data from a small data set of 5 patients and, contrary to our findings, demonstrated no significant differences in GTVs between sequences as measured by 6 observers.26

It is accepted that a combination of both T2W MRI and DW-MRI improves cancer detection and localization.21,27 Use of a second additional functional technique such as DCE-MRI has been shown to further improve sensitivity28 for tumour detection. In the assessment of volume on the other hand, the addition of DW- and DCE-MRI sequences to T2W assessments has been reported to influence interobserver variability of tumour outlining.9 The mean differences between observers on each of the three sequences in this study was <28%, smaller than the mean intersequence differences of up to 70%. T2W sequences remain the preferred choice on which to delineate prostate tumour in current practice, as their higher spatial resolution and low geometric distortion enables registration with CT images used for radiation therapy planning. As DW techniques improve and thresholding of quantified ADC allows automated segmentation of tumour this may change, giving preference to semi-automated segmentation on ADC maps.

We have additionally shown that differences between T2W MRI- and DW-MRI- or DCE-MRI-derived GTVs scale consistently with tumour volume. These results suggest that the volume of neoangiogenesis is smaller than the volume of abnormal cellular morphology demonstrated on T2W MRI or on DW-MRI respectively. The significant differences between GTVs derived from DCE-MRI compared with those from both other sequences and histology also may be in part due to the lower spatial resolution of this sequence.

Although whole-mount histopathology is regarded as a gold standard for correlating image-derived tumour volume measurements, it should be noted that this technique also has innate margins of error and is subject to operator-dependent variation depending on experience and the equipment available. There is also documented variability in the interpretation and grading of Gleason grades,29 and substantial variability has been reported in the current clinical volume estimation methods.30 In our study, to minimize slice width variations, we used a specially devised slicer to mitigate these effects. All samples were processed in the same manner and tumours demarcated by one histopathologist to reduce intraoperator variability.

All imaging in our study was performed with an endorectal coil, which causes posterior deformation of the gland. Although this has potential for error when performing two-dimensional measurements, we would not expect an influence on volume measurements where tumour ROIs are defined on all slices with visible tumour. Histological assessments of tumour were limited by manual assessments of photographs with an overlain grid, but this has provided good correlation of imaging and pathological volumes in other tumour types.31,32 Digital analysis of histopathological volumes (planimetry)33 is more robust where available.

A limitation of our data is the lack of information on spatial conformity of ROIs between sequences, which was assessed only by visual cognitive fusion to confirm concordance. In previous work aimed at identifying the index lesion, this proved time consuming with marginal improvements over cognitive fusion by an experienced observer.34 In addition, field inhomogeneity at air–tissue interfaces can cause distortions and lead to errors in echoplanar imaging-based diffusion-weighted imaging, particularly at higher field strengths. However, the rectal balloon was filled with perfluorocarbon for our 3.0-T data acquisition, minimizing any such distortions. At 1.5T, the volume measurements on diffusion-weighted imaging corresponded most closely with histology, indicating that distortion is not the key factor in measurement error of the DIPL. Another limitation of our study was the use of the peak enhancement DCE-MRI sequences for tumour delineation rather than the quantitative DCE pharmacokinetic parameter maps of Ktrans (volume transfer coefficient reflecting vascular permeability), Kep (flux rate constant) and Ve (extracellular extravascular volume fraction).

In summary, we have established that mpMRI-derived GTV measurements of DIPLs derived from T2W, DW-MRI and DCE sequences are reproducible between observers. GTV is largest on T2W images and smallest on DCE-MRI images, and T2W GTVs best approximate to in vivo tumour volume. Therefore, GTV should be delineated on T2W images when defining the DIPL for radiation dose boosting. Differences in volumes derived from T2W MRI, DW- and DCE-MRI images are highly significant reflecting differences in cellular and vascular proportions; the proportion of the T2W volume represented by the DW- and DCE-MRI volume in this sample were independent of Gleason grade.