This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

Marc E Miquel, William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, United Kingdom

ORCID number: $[AuthorORCIDs]

Author contributions: All authors equally contributed to this paper with conception and design of the study, literature review and analysis, drafting and critical revision and editing, and final approval of the final version.

Conflict-of-interest statement: Authors declare no conflicts of interest for this paper.

Open-Access: This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

There is considerable disparity in the published apparent diffusion coefficient (ADC) values across different anatomies. Institutions are increasingly assessing repeatability and reproducibility of the derived ADC to determine its variation, which could potentially be used as an indicator in determining tumour aggressiveness or assessing tumour response. In this manuscript, a review of selected articles published to date in healthy extra-cranial body diffusion-weighted magnetic resonance imaging is presented, detailing reported ADC values and discussing their variation across different studies. In total 115 studies were selected including 28 for liver parenchyma, 15 for kidney (renal parenchyma), 14 for spleen, 13 for pancreatic body, 6 for gallbladder, 13 for prostate, 13 for uterus (endometrium, myometrium, cervix) and 13 for fibroglandular breast tissue. Median ADC values in selected studies were found to be 1.28 × 10-3 mm2/s in liver, 1.94 × 10-3 mm2/s in kidney, 1.60 × 10-3 mm2/s in pancreatic body, 0.85 × 10-3 mm2/s in spleen, 2.73 × 10-3 mm2/s in gallbladder, 1.64 × 10-3 mm2/s and 1.31 × 10-3 mm2/s in prostate peripheral zone and central gland respectively (combined median value of 1.54×10-3 mm2/s), 1.44 × 10-3 mm2/s in endometrium, 1.53 × 10-3 mm2/s in myometrium, 1.71 × 10-3 mm2/s in cervix and 1.92 × 10-3 mm2/s in breast. In addition, six phantom studies and thirteen in vivo studies were summarized to compare repeatability and reproducibility of the measured ADC. All selected phantom studies demonstrated lower intra-scanner and inter-scanner variation compared to in vivo studies. Based on the findings of this manuscript, it is recommended that protocols need to be optimised for the body part studied and that system-induced variability must be established using a standardized phantom in any clinical study. Reproducibility of the measured ADC must also be assessed in a volunteer population, as variations are far more significant in vivo compared with phantom studies.

Core tip: Diffusion-weighted magnetic resonance imaging was highlighted as a potential cancer imaging biomarker by a team of experts in a report published in 2009. We review the variability of published diffusion values in the major extra-cranial organs and focus on the validation literature, both in vivo and in vitro. A total of 115 studies were selected including for liver parenchyma, kidney, pancreatic body, spleen, gallbladder, prostate, uterus (endometrium, myometrium, cervix) and breast. We also look in detail at the published repeatability and reproducibility studies, both in vivo and in phantoms. A series of recommendations based on our findings are given at the end of this review.

Diffusion-weighted magnetic resonance imaging (DW-MRI) was first implemented clinically in 1986[1] to study neurologic disorders. It has since developed into a mature technique for many brain applications[2]. In cancer imaging, DW-MRI has seen a great interest in both clinical and pre-clinical research during the past 20 years (with more than 106000 entries in Google Scholar for diffusion + mri + cancer). The concept of using DW imaging for the detection of malignant lesions started in early 1980s[3] but was not fully utilized until the late 1990s when a series of innovations in echo-planar imaging, high gradient amplitudes, multi-channel coils and parallel imaging made it possible to translate it to clinical settings[4]. DW-MRI was highlighted as a potential cancer imaging biomarker by a team of experts and stakeholders in a meeting report published in 2009[5]. In this report, it was also concluded that baseline patient reproducibility studies should be part of the study designs. After an introduction on the physics of diffusion weighted imaging, the article looks in more detail at what is actually measured in vivo and in particular the effect of perfusion as it has clear implications on the values measured using MRI. The article then reviews the variability of published diffusion values in the major extra-cranial organs and focuses on the validation literature, both in vivo and in vitro. Finally, DW-MRI reproducibility studies are summarized both using phantoms (6 studies) and in vivo (13 studies).

BASIC PRINCIPLES OF PULSED FIELD GRADIENT DW IMAGING

Diffusion is a Brownian motion of molecules in a medium[6]. At room temperature (298 K), a sample containing a small molecule, such as water, has a self-diffusion coefficient of about 2.3 × 10-3 mm2/s[7]. In biologic tissues, diffusion coefficients are lower due to viscosity and restricted diffusion effects, which enables one to differentiate between different tissue structures[8]. Cellular tissues such as tumours often return lower diffusion values compared to healthy tissues, which facilitates their detection. In the presence of a magnetic field gradient, diffusion of water molecules causes a phase dispersion of the transverse magnetization, which results in the attenuation of the MRI signal[8]. In DW-MRI, image contrast is derived based on differences in the mobility of protons between tissues, which is reflected by the attenuation of the MRI signal. To increase the sensitivity to diffusion, all diffusion imaging pulse sequences contain a diffusion-weighting gradient.

Diffusion measurements are usually performed using a pulsed field gradient (PFG) pulse sequence. A spin-echo sequence is preferred as the 180° radio-frequency pulse refocuses chemical shifts and the frequency dispersion due to the residual B0 inhomogeneity and susceptibility effects whilst a gradient echo only refocuses phase dispersion resulting from the gradient pulses[9]. Stejskal[10] and Tanner[11] introduced a PFG diffusion measurement method that uses two large gradient pulses with a short duration δ and separated by a variable time interval Δ as shown in Figure 1. In the presence of diffusion and gradient pulses, the attenuation due to relaxation and the attenuation due to diffusion and the applied gradient pulses are independent. This is expressed in equation (1.1).

Figure 1 Schematic representation of the pulsed field gradient pulse sequence.
In this description we assume that we start the sequence with a sample containing only four in-phase spins labelled with 1, 2, 3 and 4. In the absence of diffusion, the first gradient pulse causes dephasing of the spins. The 180° radio-frequency pulse reverses the sign of the phase angle and thus after the second gradient pulse all spins are in phase which gives a maximum echo signal. In the presence of diffusion, spins go through a random walk process resulting in a distribution of phases. This in turn results in poorer refocusing of the spins and thus, a smaller echo signal.

Sb/S0 = e-bD (1.1)

where Sb and S0 are the voxel signal intensity with and without diffusion respectively and b-value controls the degree of diffusion weighting in an image and b = γ2G2δ2 (Δ - δ/3) where γ is the gyromagnetic ratio and G is the amplitude of the diffusion gradients in mT/m.

Other pulse sequences, have been suggested to achieve diffusion-weighting, for example stimulated echo-based sequences[11,12] and steady-state free precession sequences[13-15].

Quantitative DW imaging is based on at least two DW images, each acquired at the same location but with a different b-value. A mono-exponential fit between the natural logarithm of the signal intensity of the tissue against the b-value is performed on a pixel-by-pixel basis and the slope of the linear regression yields the apparent diffusion coefficient (ADC) displayed in a quantitative map. The calculated diffusion coefficient can be influenced by tissue perfusion and other experimental errors. Therefore, they are often referred to as ADCs. In practice, measurements in three orthogonal gradient directions are often obtained and the signals averaged, producing the corresponding b-value trace images[16]. Trace image SD can be computed using a geometric average[17] of the DW images acquired in three orthogonal directions as expressed in equation (1.2). This is to average out the effects of anisotropy. The trace image is rotationally invariant which implies that image intensity is independent of patient orientation.

<SD> = (Sx.Sy.Sz)1/3 (1.2)

Where <SD> denotes the averaging process and Sx, Sy and Sz are the diffusion sensitizations acquired in three orthogonal directions. ADC maps are then computed from the isotropic diffusion image SD and the baseline image S0 (obtained without diffusion gradients) on a pixel-by-pixel basis using the relationship D = -ln (SD/S0)/b. This results in improvements in the signal to noise ratio of calculated ADC maps. The slope of the line that describes this relationship in each voxel represents the ADC. Despite using different scanner-specific techniques and image scaling methods to compute ADC maps, it was demonstrated that ADC measurements provided by different vendors were within 3% of the true value[18] using the diffusion coefficient of water at 0 °C as a reference.

DW IMAGING IN BIOLOGICAL TISSUE

In biological tissue, the DW signal is derived from the molecular diffusion of water and microcirculation of blood in the capillary network. In 1986, Le Bihan et al[1] proposed the principles of intravoxel incoherent motion (IVIM) to describe the microscopic translational motions that occur in each image voxel in DW imaging. The fraction of water diffusing and flowing in the capillaries of a given voxel involves only a fraction of total water content of the voxel[1,8]. This fractional volume is often referred to as the perfusion factor f. In all cases a biological tissue includes a volume fraction f of perfusion and a volume fraction 1 - f of diffusing water.

where D is the diffusion coefficient of water molecules in the tissue and D* is the fast pseudo-diffusion coefficient due to the incoherent flow of blood-water in the randomly oriented micro-vascular network. Microcirculatory perfusion of blood within capillaries depends on the velocity of the flowing blood and the vascular structure. Signal attenuation resulting from IVIM is typically an order of magnitude greater than tissue diffusion because of the larger distances of proton displacement during the application of the PFG pulses[19]. Therefore at higher b-values, IVIM accounts only for a small proportion of the measured signal in each imaging voxel. Experimental and clinical data indicate a bi-exponential behaviour of signal attenuation in body tissues using DW-MRI and this indicates that the signal attenuation observed at low b-values (< 100 s/mm2) is related to tissue perfusion[1,5]. Other mathematical models have been suggested to describe quantitative DW-MRI namely stretched exponential[20], Gaussian[21] and Kurtosis[22].

A typical DW-MRI study in a patient whereby different images with multiple b-values are produced is shown in Figure 2. The range of b-values depends on investigator preferences and varies according to the anatomical region in the prospective study. The concept of a bi-exponential fit is also demonstrated in this figure.

Figure 2 Diffusion-weighted magnetic resonance images of the abdomen of a healthy 25-year-old male volunteer at different b-values of 0, 10, 30, 50, 100, 300, 500, 1000 s/mm2.
An ROI placed over a non-heterogeneous region in the liver is shown on the b = 0 s/mm2 image. A bi-exponential fit to the ROI drawn on the diffusion-weighted-magnetic resonance data acquired with b-values of 0, 10, 20, 30, 40, 50, 75, 100, 150, 200, 300, 400, 500, 750, 1000 and 1300 s/mm2 is also shown where the slopes of the exponents represent the fast diffusion component (which includes perfusion) and the slower diffusion component. Quantitative apparent diffusion coefficients maps are also shown where ADC low was computed with b-values ≤ 100 s/mm2 and ADC high was computed with b-values ≥ 150 s/mm2. ROI: Region-of-interest; ADC: Apparent diffusion coefficient.

The signal intensity from protons with larger diffusion distances per unit time such as blood flow is attenuated with small b-values (< 100 s/mm2)[23]. This is in contrast to cellular tumours containing protons with shorter diffusion distances where there is usually less signal attenuation and hence higher b-values are required (> 500 s/mm2)[23]. It has been shown that signal attenuation in liver DW-MRI is non-linear with increasing b-value due to microcapillary perfusion[24,25]. This can be seen clearly in Figure 2 where a bi-exponential fit (using the Levenberg-Marquardt algorithm) to the regions-of-interest (ROI) drawn on the DW-MR data acquired with multiple b-values is shown. Whilst this is true for liver, in prostate DW-MRI a mono-exponential fit is sufficient to discriminate prostate cancer from normal tissue using b-values ranging from 0-800 s/mm2[26] and that the perfusion component must be excluded in diagnosis, prognosis and treatment response[27]. Understanding the IVIM[1,8] phenomenon is important because the choice of b-values determine the extent to which the computed ADC maybe influenced by tissue perfusion at low b-values. This explains why ADCs reported in abdominal studies using b-values (< 100 s/mm2)[25,28-31] are higher than those obtained by using higher or a wide range of b-values[29,31-34].

“The signal intensity observed on the diffusion image is dependent on both water proton diffusivity and the tissue T2-relaxation time”[23]. This means that a lesion may appear to show restricted diffusion on high b-value images due to long T2-relaxation time rather than the limited mobility of water protons and are therefore difficult to characterize with visual assessment of DW-MR images[35]. This phenomenon is called T2 shine-through effect and was first observed in brain diffusion imaging[36]. “The presence of T2 shine-through is recognized by correlating high b-value images with the ADC map” whereby areas demonstrating T2 shine-through rather than restricted diffusion will show “high diffusivity on the ADC map and high ADC values”[23].

“Water motion can occur preferentially in some directions in anisotropic tissues due to presence of obstacles that limit molecular movement in some directions”[23]. This anisotropic behaviour “can be detected by observing differences in diffusivity by using diffusion gradients in at least six directions”. Diffusion tensor imaging has been used predominantly for brain imaging[37,38] with limited data for body imaging of liver[39,40], kidneys[41-43], breast[44] and prostate[45].

VARIABILITY OF PUBLISHED ADC VALUES IN NORMAL TISSUE

Most DW-MRI studies have been conducted using 1.5T MR systems although 3.0T systems are increasingly being used due to increased availability and potential for improved image quality[33,46-58]. The following sections are by no means a comprehensive review of all the published literature to-date but it is rather intended to give an overview of the variation in the published ADC values in clinical extra-cranial studies (in vivo) and provide the readers median values for the different organs. A total of 115 studies, were used in this review including for liver parenchyma, kidney (renal parenchyma), pancreatic body, spleen, gallbladder, prostate, uterus (endometrium, myometrium, cervix) and breast. These studies were selected using Google as the search engine where selection was based on highly cited detailed articles in the relevant anatomy. Healthy tissue investigator reports in anatomies such as uterus and gallbladder are noticeably less compared with that of liver and therefore studies with a low number of citations were also included. Recent reports (those published in 2015) were selected using Google Scholar by applying the date filter. In the selection of all of these studies, different magnet field strengths from different vendors, a wide range of b-values, a number of different diffusion sequences and different human populations from different regions and continents were included. This was to remove vendor-specific, sequence-specific and population-specific bias. Box and whisker plots for the different organs are shown in Figure 3. Details of the studies are provided in, Tables 1, 2, 3, 4, 5, 6, 7 and 8.

In DW-MR literature, no organ in the abdomen has received more attention than the liver[46]. Several investigators have reported the usefulness of DW-MRI for detection of malignant liver lesions[24,28,47]. Ichikawa et al[28] found that DW-MR differentiated between hemangiomas, hepatocellular carcinomas (HCCs) and metastases and that their respective mean ADC values were significantly greater than the mean ADC values of the normal liver. Liver DW-MRI is routinely performed by using tri-directional diffusion gradients along each of the x, y and z directions[23]. Reported ADC values in healthy liver parenchyma (Table 1) vary between 0.81 ± 0.09 × 10-3 mm2/s[48] to 2.4 ± 0.5 × 10-3 mm2/s[49] leading to a median value of 1.28 × 10-3 mm2/s. Values are reportedly higher in studies where b-values of less 100 s/mm2 were solely used in the computation of the ADC[28,29] due to perfusion effects. Insignificant differences in ADC values between the three diffusion gradient directions were observed[47], proving the isotropic structure of liver parenchyma. Because of the relatively short T2-relaxation time of the normal liver parenchyma, 46 ± 6 ms at 1.5T and 34 ± 4 ms at 3.0T[50], the b-values used in clinical imaging are typically no higher than 1000 s/mm2[23] although some studies did use b-values of up to 1300 s/mm2[32]. To generate higher b-values longer PFG pulses with longer echo times are needed and therefore loss of signal from T2 decay. The ideal TE in DW-imaging of extra-cranial organs should approximately be the T2-relaxation time of the organ undergoing the study. Some studies looking at DW imaging of the liver used TE values significantly higher than the T2-relaxation time[24,47]. In liver DW-MRI, only few studies are known that have used a TE of less than 50 ms[33,51]. Taouli and Koh[23] suggested a minimum echo time of 71 ms to reduce shine-through effect, which should be kept fixed for all b-values used in the study. They also recommended b-values of less than 500 s/mm2 for breath-hold acquisitions and less than 1000 s/mm2 for free breathing or respiratory triggered acquisitions. DW-MR combined with T1-weighted and T2-weighted imaging was shown to perform equally as well as Gadolinium-MR in the diagnosis of liver metastases[52]. Guo et al[53] found that a correlation exists between ADC values and the histological grade of HCCs although some HCCs were poorly differentiated due to overlap of ADC values with those of normal liver. These findings were contrary to the report by Nasu et al[54] whereby no correlation was found between ADC values and the histological grade of HCCs. This discrepancy in findings could be attributed to the placement of ROIs where the investigators in[54] defined ROIs encompassing HCCs in their entirety whereas necrotic and hemorrhagic areas were deliberately avoided by the investigators in[53].

Kidney

The major role of the kidneys is water reabsorption and concentration-dilution functions[41] and, therefore, DW-MRI may provide useful insights into the mechanisms of various renal diseases. The majority of published values in renal parenchyma DW-MRI (Table 2) report an ADC estimate for two tissue types, renal cortex (outer portion of the kidney) and renal medulla (innermost part of the kidney)[30,31,55,56]. Some studies reported ADC values for the left and right kidneys with statistically insignificant difference between the two values[57,58]. Few studies reported anisotropic diffusion in the kidney particularly in the renal medulla due to the radial orientation of the renal vessels and the collecting system[41,59]. The T2-relaxation time in renal cortex and renal medulla are 87 ± 4 ms and 85 ± 11 ms at 1.5T and 76 ± 7 ms and 81 ± 8 ms at 3.0T respectively[50]. The highest ADC estimate across the entire kidney (3.54 ± 0.47 × 10-3 mm2/s) was reported in[60] where the authors used an echo time of 18 ms achieved by having a stimulated-echo DW-MR sequence and b-values of less than 400 s/mm2. Other authors[29,31] did report ADC estimates of higher than 3.70 × 10-3 mm2/s but the b-values used in the computation were less than 150 s/mm2 and therefore, perfusion effects led to an increase in the computed ADC value. ADC values in selected studies varied between 1.50 × 10-3 mm2/s [31] and 3.54 × 10-3 mm2/s[60] leading to a median value of 1.94 × 10-3 mm2/s.

Spleen

Normal spleen as well as accessory spleens, have the greatest degree of non-pathological restricted diffusion of all solid abdominal organs[61]. In DW-MRI of healthy spleen tissue (Table 3), the highest computed ADC in the selected studies was 1.28 ± 0.39 × 10-3 mm2/s[49] using b-values of ≤ 400 s/mm2 while the lowest was 0.59 ± 0.04 × 10-3[32] using b-values of ≤ 1300 s/mm2 leading to a median value of 0.85 × 10-3 mm2/s. Several authors have proposed to use the spleen as a reference organ for ADC measurements of liver parenchyma[29,62] in order to decrease variability of liver ADC measurements despite the fact that patients with cirrhosis and portal hypertension frequently suffer from splenomegaly (enlargement of the spleen)[63]. Klasen et al[63] demonstrated that patients with liver cirrhosis and portal hypertension had significantly higher spleen ADCs. Spleen T2-relaxation times are 79 ± 15 ms and 61 ± 9 ms at 1.5T and 3.0T respectively[50] and some studies[24,64] did use echo times significantly higher than the T2-relaxation time.

Pancreas

Evaluation of solid lesions in the pancreas lies mainly in the discrimination between benign mass-forming focal pancreatitis and pancreatic carcinoma[65]. Unfortunately differentiating between benign mass-forming focal pancreatitis and pancreatic ductal adenocarcinoma is extremely difficult as they both show similar histologic and radiologic patterns[66-68]. Multimodality approaches such as ultrasound, computed tomography and different MR techniques have been suggested to differentiate between benign mass-forming focal pancreatitis and pancreatic carcinoma[66]. Healthy pancreatic tissue ADC estimates (Table 4), range between 1.02 ± 0.28 × 10-3 mm2/s[24] to 2.63 ± 0.72 × 10-3 mm2/s[49] leading to a median value of 1.60 × 10-3 mm2/s. Estimated ADC values were shown to be statistically insignificant between normal pancreatic tissue, benign mass-forming focal pancreatitis and pancreatic carcinoma[65]. Ichikawa et al[69] demonstrated that qualitative high b-value DW-MRI is valuable in detecting pancreatic carcinomas and may prove more useful than a quantitative measure such as the computed ADC. It has been suggested that DW-MRI of the pancreas should act as a supplement to other imaging modalities to differentiate between benign mass-forming focal pancreatitis and pancreatic carcinoma[65,70]. Pancreatic tissue T2-relaxation times are 46 ± 6 ms and 43 ± 7 ms at 1.5T and 3.0T respectively[50]. Barral et al[70] recommended the use of DW-MRI where there is a clinical suspicion and imaging findings suggestive of endocrine pancreatic tumour as well as for the detection of liver metastasis in patients with exocrine pancreatic tumours. Multi-centre studies of the DW-MRI of the pancreas in terms of image quality and reproducibility of the diffusion parameters are needed to assess the suitability of ADC in the evaluation of pancreatic disease[70]. Nevertheless, DW-MRI of the pancreas will have an expanded role in the evaluation of patients with pancreatic disease since technological advancements continue to improve the quality of clinical DW-MRI [70].

Gallbladder

Ultrasonography is usually the modality of choice in evaluating gallbladder diseases[71] because of its relatively low cost and widespread availability. DW-MRI is more widely used for further characterisation of potentially malignant gallbladder lesions. Limited DW-MRI data exist for gallbladder as T2-weighted MRI in the biliary tracts is generally used to assess the extent of disease and cancer staging[72].

Healthy gallbladder ADC estimates reported in selected studies are shown in Table 5. Reported ADC estimates of the gallbladder in the selected studies range between 2.506 ± 0.223 × 10-3 mm2/s[56] and 3.50 ± 0.51 × 10-3 mm2/s[57] leading to a median value of 2.73 × 10-3 mm2/s. As a reference point, the diffusion coefficient of water at 35 °C is 2.92 × 10-3 mm2/s[7] and correcting for c.a. 2.4% variation per degree Celsius change in temperature[73] yields 3.06 × 10-3 mm2/s at 37 °C body temperature. All ROIs in the selected studies were placed over the biliary liquid encompassed by the gallbladder. Placement of ROIs differed by investigators and none of them discussed partial volume effects, which could potentially alter the ADC estimate due to ROI placement over the gallbladder wall. Size of ROIs differed among investigators ranging from an oval 100 mm2[57], to a 32 pixel ROI[56] placed at the centre of the gallbladder (average pixel size 3.125 mm), to an ROI encompassing the entire gallbladder[34] and finally, to an ROI standardized to 2 cm2[58]. The effect of T2 shine-through can be seen clearly in the normal gallbladder and most studies used b-values of up to 1100 s/mm2. Yamada et al[24] found that the perfusion fraction in gallbladder is zero and that diffusion is the only type of motion in gallbladder on a par with ascites.

Prostate

Healthy prostate tissue ADC estimates of the central gland and peripheral zone from selected studies are presented in Table 6. Reported ADC values are higher in the peripheral zone (PZ) compared to the central gland (CG). The CG consists of more compact smooth muscle cells and sparser glandular elements than the PZ, leading to a lower extracellular-to-intracellular fluid ratio and to lower ADC values[74,75]. The highest reported ADC value in the PZ was 1.99 ± 0.208 × 10-3 mm2/s[76] and the lowest was 1.25 ± 0.23 × 10-3 mm2/s[77] leading to a median value of 1.64 × 10-3 mm2/s. In the CG, the highest reported ADC value was 1.72 ± 0.35 × 10-3 mm2/s[27] and the lowest was 0.9 ± 0.1 × 10-3 mm2/s[78] leading to a median value of 1.31 × 10-3 mm2/s. The majority of studies employed b-values of less than 800 s/mm2. Average T2-relaxation times of prostate are 88 ± 0 ms and 74 ± 9 ms at 1.5T and 3.0T respectively.

Potential usefulness of DW-MRI in localizing prostate cancer has been shown by a number of investigators[27,79,80]. DW-MRI is one of the criteria used in the scoring of the likelihood of prostate cancer in the prostate imaging reporting and data system[81]. The majority of prostate cancer arises in the peripheral zone (68%)[82]. Although studies have demonstrated improved sensitivity and specificity in prostate cancer detection using DW-MRI, tumours smaller than 5 mm are difficult to detect[83]. Prostate transition zone is the site of benign prostatic hyperplastic nodules, which can have low ADC values and hence mimic tumour[83]. Post-biopsy haemorrhage in the prostate gland may cause susceptibility artefact[84] and add further uncertainty in the computed ADC map as it presents itself as a region of low signal intensity and hence mimic tumour[83]. Nevertheless, DW-MRI in combination with T2-weighted imaging has been shown to be significantly better than T2-weighted imaging alone in the detection of significant cancer within the peripheral zone of the prostate[80].

Gynaecologic sites

Gynaecologic DW-MRI comprises five main categories: Ovaries or fallopian tubes, endometrium, myometrium, cervix and vulva[85,86]. Gynaecologic healthy tissue ADC estimates are not often reported and only thirteen studies were included (Table 7) for normal endometrium (5), myometrium (3) and cervix (7). Range of reported ADCs for endometrium, myometrium and cervix in the selected publications are respectively (1.27 ± 0.22[87] - 1.53 ± 0.10[88]), (1.50 ± 0.20[87] - 1.62 ± 0.11[89]) and (1.41 ± 0.10[90] - 2.09 ± 0.46[91]) × 10-3 mm2/s, leading to median values of 1.44, 1.53 and 1.71 × 10-3 mm2/s respectively. Of the three anatomies, cervix has the greatest variation in ADC values, which could be attributed to its anatomical location. The air-tissue interface causes greater susceptibility-induced artefact in the acquired DW-MR images and the ADC estimate may vary considerably across studies due to the placement of the ROI. Average T2-relaxation times for endometrium, myometrium and cervix are significantly different: 101 ± 21, 117 ± 14 and 58 ± 20 ms at 1.5T and 59 ± 1, 79 ± 10 and 83 ± 7 ms at 3.0T respectively[50]. DW-MRI can provide useful information in differentiating uterine endometrial cancer from benign lesions[88,92]. Tamai et al[88] demonstrated that there was no overlap between ADC values in normal endometrium and endometrial cancers. Nougaret et al[93] found a significant difference in the ADC values of grade 3 endometrial tumours compared to those of grade 1 and 2. However, in adjacent myometrium differentiating between benign and malignant disease based on ADC values alone is difficult[94]. In the ovaries, the majority of prior studies reported ADC values of benign and malignant lesions. Katayama et al[95] concluded that ADCs might not provide additional information in differentiating benign from malignant ovarian lesions, as there was a significant overlap[95-97] between ADCs in benign and malignant solid tumours. In the cervix, ADC values could play a role in the diagnosis[98] and as a surrogate biomarker of treatment response[99]. Luomaranta et al[100] concluded that DW-MRI is more reliable in the radiological staging of endometrial carcinoma compared with contrast-enhanced MRI.

Breast

Mammography is the modality of choice in breast screening but is less effective in women with very dense breasts (higher content of fibroglandular tissue compared to fatty adipose tissue) and those with BRCA1 genetic predisposition and MRI screening may offer added benefit[101-103]. Increasingly the added value of DW-MRI to the normal MRI screening particularly in dense breasts is being examined[104]. Normal fibroglandular breast tissue ADC estimates from selected studies are summarized in Table 8. The highest ADC estimate for normal fibroglandular breast tissue in selected studies was 2.37 ± 0.27 × 10-3 mm2/s[105] and the lowest reported was 1.51 ± 0.29 × 10-3 mm2/s[182] leading to a median value of 1.92 × 10-3 mm2/s. The majority of DW-MR studies in the breast investigated lesion detection and characterisation[105-107]. Some focused on the measured ADC values during different weeks of the menstrual cycle[17,108] while others focused on the significance of pre- and post-menopausal ADC values[109]. Average T2-relaxation times of breast fibroglandular tissue and that of fatty adipose tissue at 1.5T field strength were reported as 40 ± 10 ms and 130 ± 10 ms/380 ± 30 ms (two values corresponding to the dominant lipid peaks) respectively[110]. This relatively short T2-relaxation time of normal fibroglandular breast tissue must be considered when optimizing b-values for DW-MRI studies.

Suppression of lipid signal in DW-MRI of the breast is essential to reduce image artefacts and to increase lesion detection[111]. Different fat suppression techniques were compared in few studies[112-114] whereby significant differences in the computed ADC values were observed between spectral fat suppression (SPAIR) and short-time inversion recovery (STIR) techniques[112] and a larger overlap in ADC values between tumour and benign tissue was detected using STIR[112]. However, the authors in[113] found that the computed ADC values using SPAIR and STIR fat suppression techniques were very similar. In another study[114] four types of STIR, SPAIR, spectrally adiabatic inversion recovery and water excitation were compared of which water excitation yielded the highest signal-to-noise. Regardless of the choice of the fat suppression technique, multi-centre studies are required to standardise DW-MRI parameters and to establish the clinical utility of DW-MRI and ADC values of malignant and benign disease[111].

Cancer vs normal tissue

DW MRI is already being incorporated into general oncologic imaging practice. One of its main advantageous is that it does not require intravenous contrast media enabling its use in patients with reduced renal function[5]. Increase in tumour cellularity and architectural distortion contribute to decreased ADC values. In tissues that are highly cellular, tortuosity of the extracellular space and the higher density of hydrophobic cellular membranes restrict the apparent diffusion of water protons[23,115,116]. Therefore it is expected that ADC values would correlate with tumour cellularity and grade as it has been shown in[117]. In Table 9 reported ADC values of malignant vs normal tissue from selected studies in different anatomical regions are shown. In the majority of oncologic studies, a significant change in ADC has been detected between each of the malignant disease, benign and normal tissue. Radiologists use increased tumour cellularity as a biomarker of malignancy using DW-MRI to differentiate between benign and malignant disease[47,28,118]. However, tumour necrosis and nuclear atypia can account for imperfect correlations between ADC values and cellularity with necrosis being an intrinsic component of poorly differentiated tumours as it increases ADC values[16]. Other clinical oncologic uses include monitoring treatment response after chemotherapy or radiation, differentiating post-therapeutic changes from residual active tumour and detecting recurrent cancer[5]. Potential future applications include predicting treatment outcomes before and after therapy, tumour staging and detecting lymph node involvement by cancer[5]. There is much contention about these potential applications of DW-MRI and its potential role in differentiating between tumour grades. Unsubstantiated claims have been made in the literature about tumour staging. The authors in[119] staged breast tumour grades based on statistically insignificant changes in the median ADC (grade 1 ADC 1.11 mm2/s, grade 2 ADC 1.10 mm2/s and grade 3 ADC 1.06 mm2/s). This is in contrast to the study conducted in[117] where there was a statistically significant change in the mean ADC between a high-grade glioma (ADC 1.2 mm2/s) and a low-grade glioma (ADC 2.7 mm2/s). The authors in[120] also differentiated between endometrial tumour grades based on statistically insignificant changes in the mean ADCs, however, they were confidently able to differentiate between benign and malignant disease. The utility of ADC was also investigated in ultrasound-guided biopsies in the detection and localization of prostate cancer. In a large-scale cohort study of 1448 patients[121] who underwent systematic biopsies (890 patients with low-ADC lesions underwent additional targeted biopsies), the authors demonstrated that targeted biopsy strategy based on ADC maps can be useful in the patient selection for subsequent prostate biopsies and in the detection and localization of prostate cancer to high accuracy.

In this section a literature survey of the repeatability and reproducibility of ADC values both in phantoms and in vivo is provided. Repeatability refers to test conditions that are as constant as possible, where the same operator using the same equipment within a “short time interval” obtains independent test results with the same method on identical items in the same laboratory[122]. On the other hand, “reproducibility refers to test conditions under which results are obtained with the same method on identical test items but in different laboratories with different operators using equipment”[122]. Therefore repeatability informs on equipment variation while reproducibility informs on observer/experimental variation[5].

Bland-Altman plots[123] are frequently used to show any trends in the variability of ADC measurements over the measuring interval. Bland-Altman plots help to illustrate the bias-variance relationship and limits of agreement[124]. The basis for estimates of repeatability is the within-subject variance assuming that all other factors have been controlled through experimental design[124]. Within-subject variance may include biological or physiological variability as well as patient repositioning and scanner calibrations[124]. Repeated-measures analysis of variance (rm-ANOVA) is used to assess differences in ADC values measured at each b-value between magnetic field strengths[125]. Inter-reader agreement regarding ADC measurements is frequently assessed by computing the intra-class correlation coefficient (ICC)[5,56]. ICC is a measure of repeated measures consistency relative to the total variability in the population[124]. The within-subject coefficient of variation is often reported for repeatability studies to assess repeatability in test-retest designs[124]. One-way analysis of variance (one-way ANOVA) is usually used to test discrepancy between the highest and lowest values and difference in these results among MR scanners[126]. Bonferroni correction is typically used to counteract the problem of multiple testing[56,125]. Statistical significance is usually assessed at P < 0.05[125,126].

“A good qualified biomarker should have three properties: Biological relevance to the disease process under study, sensitivity to the disease process and good reproducibility”[127]. In clinical trials questions revolve around whether changes in individual patients can be measured reliably and reproducibly and whether they predict important clinical outcomes in terms of monitoring response to therapy[5,128]. Reproducibility measurements of DW-MRI data are necessary to understand the magnitude of variation that can be detected confidently. Both the size and the position of lesions are known to influence reproducibility, with larger lesions being more reproducible[129]. At the time of authoring this review, 1860 Google Scholar entries were found for (ADC + MRI + repeatability) and 8200 for (ADC + MRI + reproducibility). However, the mere use of the word repeatability and reproducibility in the entries, does not indicate an elaborate study into repeatability and reproducibility of ADC values. In a serial single-centre study, to establish treatment effect, each subject will normally be scanned at the same centre at each time point and it is the within-subject variance measured at a given centre, over the duration of the study, which is important. If the study is to be multi-centre “then between-centre variance should also be controlled”[127]. The within-centre variance for a subject or repeatability is important and it is measured using the Bland-Altman analysis method[123]. In single centre studies, “repeated measurements are usually made in pairs over a set of subjects (typically 5-20) to establish the difference between repeats and whether this depends on the mean value of the parameter being estimated”[127]. In multi-centre studies, protocol matching is the simplest method of reducing measurement differences[127] although “differences in imaging hardware produced by different vendors may prevent identical protocols being used at every site”.

ADC maps are quantitative imaging maps, which in principle are “independent of the particular imaging protocols used”[127] although in reality significant variations in ADC values of different anatomical regions have been reported both in single-centre and in multi-centre studies[126].

Phantoms have three advantages over human control subjects. First, phantoms can be scanned repeatedly “without any ethical constraints”, second, they have “known physical properties” and third, they are “relatively easy to transport between centres”[127]. Potential disadvantages include “a lack of realism compared to in vivo measurements”, “MR properties of the material progressively vary with time” and “the time and expertise required to build phantoms are prohibitive at some centres”[127]. Some phantoms have been developed to measure some tissue properties that exist in tumours[130]. Phantom measurements have been made with alkanes[131] or other organic liquids[132], which have ADC values in the range of brain tissue. Other materials include sucrose solutions[133,134], iced water[125] and gels[135,136].

Chenevert et al[125] (2011) proposed a novel ice-water phantom for DW-MRI multi-centre trials and investigated ADC variability across 20 MR scanners from 3 vendors (GE, Philips, Siemens) at 7 institutions at both 1.5T and 3.0T field strengths. To assess single-system repeatability, the phantom was also imaged on 16 different days over a period of 25 d. Site-specific DW-MRI protocols were performed as well as a standard DW-MRI protocol with b-values of 0, 500, 800, 1000, 2000 s/mm2. Vendor-independent software was used to compute the ADC maps. Magnet field strength was not found to have an impact on ADC measurements, however, significant differences in ADC measurements were observed between vendors. The authors reported a ± 5% variation in ADC across all systems and single-system repeatability was also ± 5%. Malyarenko et al[137] (2013) reported a multi-centre study using a variation of the ice-water phantom developed by Chenevert et al[125]. The authors devised a DW-MRI protocol compatible across 35 clinical MRI platforms (GE, Philips, Siemens) at 18 institutions at two field strengths of 1.5T and 3.0T. Vendor-independent software was used to compute the ADC maps. Standard deviation of ADCs measured at the magnet’s isocentre was less than 2% for all 35 platforms. Inter-site reproducibility of ADC at magnet isocentre was within 3%. ADC variability increased for off-centre measurement consistent with diffusion gradient non-linearity. Overall the authors concluded that standardization of DW-MRI protocol improved reproducibility of ADC measurements and allowed identification of non-linearity in the diffusion gradients as a source of error in the measured ADC in clinical multi-centre trials. Kıvrak et al[138] (2013) used an in-house phantom consisting of four containers filled with distilled water, 0.9% NaCl, 25% NaCl and shampoo placed into a plastic container containing tap water. DW-MRI imaging of the phantom was performed using six different scanners from four vendors (Toshiba, GE, Philips and Siemens) utilizing a multichannel head coil and b-values of 0 and 1000 s/mm2 at a room temperature of 21 °C. ADC maps were computed on seven vendor-specific workstations. Statistically significant variations in ADC values for each fluid of the phantom were recorded between some scanners. Intra-vendor variability in ADC values was statistically significant for some scanners but not others. Overall the authors concluded that there were significant intra-vendor and inter-vendor variations in the computed ADC values. Giannelli et al[139] (2014) used an in-house isotropic water (per 1000 g distilled water: 1.25 g NiSO4.6H2O + 5 g NaCl) phantom made of two cylindrical bottles to resemble female breast. DW-MRI of the phantom was performed using three scanners from three vendors (Philips, Siemens, GE) at 1.5T field strength. Two b-values of 0 and 850 s/mm2 were used to sensitize diffusion and a total of 5 acquisitions were repeated for each scanner. Vendor-independent software was used to compute the ADC maps. ADC values were found to be significantly different between scanners. Coefficient of variation for repeated measurements was less than 1% while it had a mean value of 6.8% across scanners. Overall the authors concluded that a specific quality control protocol must be devised for DW-MRI of the human breast as system-induced variations were found to be substantial. Belli et al[140] (2015) reported extensive assessment of ADC variability on 35 MR scanners (1.0T: 2.7%, 1.5T: 65.7% and 3.0T: 31.6%) from 26 participating centres. Standard doped water phantoms were developed at the coordinating centre using cylindrical bottles filled with an aqueous solution of 2 mmol/L of hexahydrate NiCl2 and 0.5 g/L NaN3. Two DW-MRI sequences were used in this study: First sequence with b-values ranging from 0 to 1000 s/mm2 in steps of 100 s/mm2 and second sequence with b-values ranging from 0 to 3000 s/mm2 in steps of 500 s/mm2. No parallel imaging technique was employed and vendor-independent software was used to generate the ADC maps. ADCs were normalized to 20 °C to assess inter-scanner variability. No statistical significance was detected for the ADCs estimated from the first DW sequence between 1.5T and 3.0T scanners while ADC estimates of the second DW sequence were significantly different between the two field strengths. Overall ADC measurements were within 5% from the nominal value and the highest deviation and overall standard deviation were 9.3% and 3.5% respectively. The authors carried out a second set of measurements on 26 scanners whereby short-term repeatability was assessed by repeating the first DW sequence five times at 1-min intervals. Short-term repeatability of ADC measurement was found to be less than 2.5% for 26 MR scanners. Doblas et al[141] (2015) reported a 7 centre multi-vendor study in which the reproducibility of ADC values was assessed on preclinical systems at field strengths of 4.7T, 7.0T and 9.4T. A miniaturized ice-water phantom was designed which was adapted from a previously reported clinical design[125]. Site-specific post-processing software packages were used to compute the ADC maps in which b-values less than 100 s/mm2 were excluded from the computation. Inter-site ADC reproducibility was 6.3% and no site was identified as presenting different measurements than others. Mean day-to-day repeatability of ADC measurements was 2.3%. Between-slice ADC variability was insignificant and mean within ROI ADC variability was 5.5%. Overall the authors concluded that with the use of standardized protocols, ADC values are comparable between sites and vendors.

Reproducibility of ADC values in vivo

In MRI studies, human control subjects have three advantages over phantoms. First they can be an almost complete simulation of the clinical measurement process in a multi-centre study, second demands for temperature stabilization are bypassed as homeostasis provides inbuilt temperature control and third human controls are often more readily available than phantoms[127]. Disadvantages include a lack of measurement stability over time for tumour-related parameters in patients, imaging humans is more demanding of resources compared to imaging phantoms and ethical constraints may limit the availability of human subjects[127]. Despite these limitations, a number of DW-MRI studies have reported ADC measurement repeatability and reproducibility using human subjects. Sasaki et al[126] (2008) studied variability of ADC values of grey and white matter in 12 healthy volunteers, within a time frame of 2 wk, using 10 systems from four different vendors (Philips, Siemens, GE, Toshiba) at 1.5T and 3.0T field strengths and b-values of 0 and 1000 s/mm2. Three different coils (multichannel coil with sensitivity correction, multichannel coil without sensitivity correction and a quadrature detection coil) were used to acquire the images and vendor-independent software was used to compute the ADC maps using a mono-exponential fit. The ADC values for gray and/or white matter of the same volunteers varied significantly between systems of all the vendors with an inter-vendor variability as high as 7%. There was also significant intra-system variability of up to 8% depending on the coil configuration in certain systems. Overall the authors concluded that there was significant variability in the ADC values. Braithwaite et al[49] (2009) tested the hypothesis that, “there is no significant variability in ADCs in the assessment of short- and midterm reproducibility of ADC measurements in a healthy population“, in five abdominal locations on a population of 20 healthy male volunteers at 3.0T using b-values of 0 and 400 s/mm2. All 20 volunteers were scanned once on the same day using 5 repeated DW-MRI acquisitions in the abdomen and 16 of the volunteers underwent a second scan within a time frame of 147 ± 20 d using another set of 5 repeated DW-MRI acquisitions. Vendor-specific software was used to generate the ADC maps and 3 ROIs were drawn for each anatomical location. Highly significant differences in the mean ADCs between the five anatomical locations were observed. No significant differences in the ADCs among the various sequence repetitions were observed. Between the two imaging sessions, no significant differences in mean ADC values were observed. Overall, the mean CV for the reproducibility of ADCs over short- and midterm was 14% and based on their results the authors suggested that ADCs are robust and can serve as a reliable quantitative tool over time. However, they also concluded that treatment effects of less than approximately 27% would not be clinically detectable with confidence with one acquisition in a single individual. Colagrande et al[142] (2010) compared ADC measurement repeatability and reproducibility of a phantom to that of abdominal DW-MRI on 30 healthy volunteers at 1.5T field strength. For the phantom study two DW-MRI sequences were employed: b-values ranging 0-200 s/mm2 (steps of 20 s/mm2) and b-values ranging from 0-1000 s/mm2 (steps of 100 s/mm2). For the volunteer study, b-values of 0 and 1000 s/mm2 were used. Vendor-independent software was used to compute the ADC maps. Overall the authors concluded that the ADC values were repeatable with an ICC of 0.80 but not reproducible (ICCs ≤ 0.45) for all methods. Larger ROIs improved reproducibility and the authors advised that for larger studies standardized ADC measurements using more than two observers are needed. Miquel et al[34] (2012) compared repeatability of the ADC measurements of a phantom containing copper sulphate (CuSO4 3 mmol/L) and salt (NaCl 34 mmol/L) solution with that of the abdomen on 10 healthy volunteers at 1.5T field strength using six b-values ranging from 0 to 1000 s/mm2 (with the exception of zero, all b-values were greater than 100 s/mm2). The phantom was imaged 10 times on two different occasions and also at regular intervals over a period of three months at a room temperature of 17 °C± 0.5 °C. A circular ROI covering 90% of the cross-section of the phantom bottle was selected on each slice. On the first day the CV of the ADC was 0.5% for 10 measurements and on day 100 the CV was 1.0% for 10 measurements. The mean intra-slice CV was 3.2% ± 1.4% and the mean sample CV was 2.9% ± 1.0%. Repeatability of the volunteer population was assessed on two occasions, 5.8 ± 1.9 d apart. The authors carried out two sets of analyses: One on volumes-of-interest (VOIs) and one on multiple smaller ROIs. Both intra-observer and inter-observer variability were small. Collectively there was no statistical difference in the group mean ADC value between the two visits of any organ. The authors concluded that larger three-dimensional VOIs result in lower variability compared to multiple two-dimensional ROIs, which depending on organs changes of over 7%-10% being significant, increasing to 20%-28% for ROIs. Bilgili[143] (2012) studied repeatability of the ADC measurements of the abdomen on 11 healthy volunteers during two repeat sessions at 1.5T field strength and using b-values of 0 and 500 s/mm2. No significant differences in the ADCs for any organ between imaging sessions were found. The CV values ranged between 7.3% for the liver and 10.4% for the kidney at a b-value of 500 s/mm2. Barral (2013) et al[68] evaluated variations in ADC measurements in normal pancreatic parenchyma at 1.5T and 3.0T field strengths using Siemens scanners. Two populations of twenty patients, who were matched for gender and age, were examined using a range of b-values from 0-800 s/mm2 (6 b-values were less than 100 s/mm2) with the first population examined at 1.5T and the second at 3.0T. Vendor software was used to compute the ADC maps using 3 b-values of 0, 400 and 800 s/mm2. Four pancreatic segments namely head, neck, body and tail were evaluated in this study by two independent observers. ADCs were measured three times by each observer. No significant differences in ADCs were found between repeated measurements and between ADCs obtained at both field strengths. The 95% limits of agreements between ADC values ranged from 1%-24.2% for intra-observer and from 4.2%-25% for inter-observer variability and did not vary substantially at either field strengths. No significant differences in ADCs of the four segments were found at either field strength. Donati et al[56] (2014) performed DW- MRI on 10 healthy men to determine the variability of ADC values in various anatomical regions in the upper abdomen using six systems from three different vendors (Philips, Siemens, GE) at 1.5T and 3.0T field strengths. In this study, 10 b-values ranging from 0 to 1000 s/mm2 (five b-values were less than 100 s/mm2) were used and vendor-independent software on an independent workstation was used to compute the ADC maps. Two readers examined the images and they found that the inter-reader agreement was excellent with an intra-class correlation coefficient of 0.876. Overall, the highest coefficient of variations (CV) was observed in the liver for both field strengths and the lowest CVs were observed in the kidney. CVs ranged from 7.0% for renal medulla to 27.1% for left liver lobe. No significant differences in mean ADC values measured at 1.5T or 3.0 T were found in any of the evaluated anatomical regions. However, they concluded that the particular vendor of an MR system influences the ADC values to a lesser extent at 1.5T than 3.0T. Chen et al[144] (2014) compared ADC variability in normal liver parenchyma obtained with multiple breath-hold, free-breathing, respiratory-triggered and navigator-triggered DW-MRI techniques at 1.5T field strength using b-values of 0, 100 and 500 s/mm2. The authors placed ROIs on 9 anatomical liver locations and did not observe any significant difference between ADCs obtained using different techniques. However, they concluded that both anatomical location and DW-MRI technique influence the reproducibility of liver ADC measurements. Jajamovich et al[145] (2014) investigated short-term reproducibility of the measured ADC in fasting conditions and after a liquid meal. Thirty individuals (11 healthy volunteers and 19 liver disease patients) were scanned twice after 6 h of fasting (5 min interval between scans) and then a third time 20 min after a liquid meal using a GE scanner at 3.0T field strength. Sixteen b-values were used in this study with 7 b-values < 100 s/mm2 and 9 b-values ≤ 800 s/mm2. Vendor-independent software was used to compute the ADC maps using both a mono-exponential model (b-values of 0 and 800 s/mm2) and a bi-exponential model (all b-values). Coefficient of variation in the fasting condition was found to be 8.2% and 15.2% for the mono-exponential model and the bi-exponential model respectively. No effect was observed in the measured ADC following caloric intake, however, a substantial effect was observed in the hepatic portal vein flow. Pazahr et al[146] (2014) assessed changes in the measured ADC of the liver before and after carbohydrate and protein-rich food intake in correlation to hepatic portal vein flow quantified using phase contrast imaging. Ten healthy volunteers underwent 4 DW-MRI scans using a 1.5T field strength GE scanner on two days. Scans 1 and 2 on the same day one with at least 8 h of fasting and the second 30 min after intake of a protein-rich drink. On the second date volunteers were first scanned after fasting for 8 h and then after intake of a carbohydrate-rich meal. Diffusion b-values of 0, 50, 150, 250, 500, 750 and 1000 s/mm2 were used in this study. Vendor-independent software was used to compute the ADC maps using a tri-exponential diffusion model with a linear fit to logarithmic signal intensities at b-values of 0 and 50 s/mm2, 50 to 250 s/mm2 and 500 to 1000 s/mm2. A phantom filled with an aqueous solution of 770 mg/L of CuSO4.5H2O was used to assess the DW-MRI sequence and the post-processing software. ROIs were drawn on the right hepatic lobe. No significant statistical differences were found between measured ADC values after fasting and after protein-rich meal or carbohydrate-rich meal for the three sets of low, intermediate and high b-values. Overall mean CVs for each participant at each session were 13.9%, 7.2% and 7.5% for low, intermediate and high b-values respectively. The authors concluded that carbohydrate and protein-rich intake both resulted in a significant increase in the portal vein flow and that there was no correlation between the increase in the portal vein flow and the measured ADC values. They also recommended that liver molecular water diffusion should be quantified using b-values greater than 500 s/mm2 only. Kolff-Gart et al[147] (2015) investigated variability of ADC values in the head and neck tissues on 7 healthy volunteers in 2 institutions using 5 MRI systems from three vendors (Philips, Siemens, GE) at 3 time points. They used two DW-MRI sequences: An EPI and a TSE using 2 b-values of 0 and 1000 s/mm2 and an additional 6 b-value (two of the b-values were less than 100 s/mm2) acquisition for the EPI sequence. Vendor-specific software was used to compute the ADC maps. Inter-system difference for mean ADC values and the influence of the MRI system on ADC values among the subjects were statistically significant. Mean difference between examinations was insignificant. They concluded that the DW EPI with 6-values was the most reproducible and that ADC values varied significantly between MRI systems and sequences. Grech-Sollars et al[148] (2015) assessed reproducibility of ADC measurements of brain tissue on eight scanners (4 Siemens 1.5T, 4 Philips 3.0T) using an ice-water phantom and 9 healthy volunteers across five institutions. Site-specific clinical protocols were used in this study using a range of b-values (0 to 1000 s/mm2) with additional b-values acquired at all centres. All scans were acquired over a period of 18 mo and a total of 65 imaging sessions took place across all centres. Vendor-independent software was used to compute the ADC maps. In the phantom, ADC measurements were reproducible with a CV of less than 1.5%. In the volunteer population, ADC measurements of white and grey matter were reproducible with an inter-scanner CV of 3% and 2.4% and an intra-scanner CV of 1.0% and 2.9% respectively. Overall the authors concluded that using standardized clinical sequences in large multi-centre studies is not essential to achieve good reproducibility of ADC measurements. Winfield et al[149] (2015) assessed the effects of eating and fasting on the measured ADCs in livers of 20 healthy volunteers. Four clinical scanners at 3 participating sites from three vendors (1 Philips, 2 Siemens, 1 GE) at 1.5T field strength were used to acquire volunteer data (5 volunteers per scanner). Diffusion weightings of 100, 500 and 900 s/mm2 were used in this study. Each volunteer was scanned four times, scans 1 and 2 occurring on the same day one with at least four hours of fasting and the other after a meal. These scans were repeated for each volunteer 1-7 d after. Vendor-independent software was used to compute the ADC values at a single site. An ice-water phantom[125] was also used to assess accuracy and repeatability of ADC estimates. Three volunteers were excluded from the final analysis. Coefficient of variation was found to be 5.1% when fasted and 4.6% non-fasted. Between-site CV was found to be 3% using the ice-water phantom. The authors concluded that there was no significant difference in ADC estimates between fasted and non-fasted measurements.

Need for validation

All of the selected studies evaluating ADC repeatability and reproducibility acknowledge that lack of standardization in data analysis, ADC quantification and interpretation is the greatest challenge in the adoption of DW-MRI for tumour assessment[4,5,56]. More studies are emerging focusing on repeatability and reproducibility of ADC measurements across institutions and using MR systems from different vendors. Malyarenko et al[150] demonstrated that the measured systematic ADC errors scaled quadratically with offset from a magnet’s iso-centre. Nonlinearity in the applied diffusion gradients was shown to be a major source of spatial DW bias and variability in off-centre ADC measurements. This bias was found to be dependant on system design and diffusion gradient direction. In the same study, the authors concluded that shim, imaging gradients and eddy currents had minor contributions in the spatial DW bias.

Present post-processing software packages for quantitative DW MRI available on scanner consoles are mostly basic allowing only a mono-exponential fit and some elementary image analysis. Although, the choice of the mathematical model depends on the anatomical region in the study but it is imperative to have the flexibility of using different models as this could influence repeatability. In a recent study on primary and secondary ovarian cancer, a stretched exponential model showed better repeatability over mono-exponential and bi-exponential models[151].

Finally, in any DW-MRI study, system-induced variability must be established using a standardized phantom as was recommended in the 2009 meeting report[5].

DISCUSSION AND CONCLUSION

In this present manuscript ADC values for healthy extra-cranial organs were summarized. In total 28 studies were selected for liver parenchyma, 15 studies for kidney (renal parenchyma), 14 studies for spleen, 13 for pancreatic body, 6 for gallbladder, 13 for prostate, 13 for uterus (endometrium, myometrium, cervix) and 13 for fibroglandular breast tissue. Median ADC values in selected studies were found to be 1.28 × 10-3 mm2/s in liver, 1.94 × 10-3 mm2/s in kidney, 1.60 × 10-3 mm2/s in pancreatic body, 0.85 × 10-3 mm2/s in spleen, 2.73 × 10-3 mm2/s in gallbladder, 1.64 × 10-3 mm2/s and 1.31 × 10-3 mm2/s in prostate peripheral zone and central gland respectively (combined median value of 1.54 × 10-3 mm2/s), 1.44 × 10-3 mm2/s in endometrium, 1.53 × 10-3 mm2/s in myometrium, 1.71 × 10-3 mm2/s in cervix and 1.92 × 10-3 mm2/s in breast. Limited studies have assessed ADC of normal uterine tissue particularly myometrium (only 3 data points) which consequently influenced the median ADC value. Differences in reported ADC values are largely attributed to differences in acquisition sequence particularly the choice of b-values and the sequence echo-time. More studies are emerging in DW MRI with recommendations on specific b-values and protocols that one must adhere to, to interrogate a particular anatomical region. Such studies can be named for liver[23,149], prostate[152] and pancreas[70]. With these organ-specific recommendations, acquisition parameters are becoming more comparable across different studies. Some historical reports of ADC values such as some references in the 1990s and early 2000s must be forgone in favour of more recent reports. Reference ADC values should be derived from a recent study with a recommended set of organ-specific b-values or by taking a median of values from multiple studies. Although changes in ADC values has proven to be a diagnostic/prognostic biomarker in differentiating malignant and non-malignant lesions, its value for monitoring response to drug treatment is less established[128]. Braithwaite et al[49] demonstrated that treatment effects of less than approximately 27% would not be clinically detectable with confidence with one acquisition in a single individual. Therefore considerable care must be taken in reporting treatment effects based on a single acquisition in a single individual.

Six phantom studies and thirteen in vivo studies were summarized in sections “Reproducibility of ADC values in vitro” and “Reproducibility of ADC values in vivo” to compare repeatability and reproducibility of the measured ADC. All selected phantom studies demonstrated lower intra-scanner and inter-scanner variation compared to in vivo studies. To date, very few studies have assessed reproducibility of the measured ADC in extra-cranial body organs. Hence studies assessing reproducibility of head and neck tissue[126,147,148] were also included in this review. Some studies used vendor-independent post-processing software packages to compute the ADC maps[56,125,126,137,140,148] while others used site-specific software packages either vendor-specific or locally developed[68,138,141,147]. Although some investigators demonstrated high variability in the measured ADC (27.1% for left liver lobe) with vendor-independent software packages[56], others found less variability in the measured ADC using vendor-specific software[148]. The majority of investigators found that standardized acquisition protocols improve reproducibility. ADC measurement variability was shown to be higher in vivo compared to phantom studies[34,148]. Reproducibility in the measured ADC was also shown to be dependant on the specific anatomy being interrogated[34,56,143]. Whilst a significant variation in the measured ADC of the left hepatic lobe was observed in[56], insignificant variation was observed in the right liver lobe[146].

Larger ROIs[142] and volumetric ROIs[34] demonstrated better reproducibility. Smaller ROIs are also known to suffer from poorer inter- and intra-observer variability[153].

ADC cut-off values are increasingly used in studies to differentiate between normal and cancerous tissues or even between tumour grades. For the latter, the differences between ADC values are often small and, although valid in the populations studied, they should not be taken as absolute numbers and used for diagnosis on a different scanner or with a different imaging protocol. Although variations in ADC values are far greater following treatments one should still be careful when using cut-off values for treatment response. However, assessing treatment response using ADC measures is a promising tool and for example Koh et al[154] demonstrated that ADC measurements were highly reproducible with a coefficient of repeatability of 0.17 in a two-centre phase 1 clinical trial setting.

Recommendations

Protocol needs to be optimised for the body part studied.

System-induced variability must be established using a standardized phantom in any clinical study.

Reproducibility of the measured ADC must be assessed in a volunteer population, as variations are far more significant in vivo compared with phantom studies.

Studies need to be assessed properly; acquisition parameters across participating sites/scanners must be matched as best as possible, in particular b-values, TE and bandwidth.

Evaluation must be organ-specific and ROI size must be taken into consideration.

Recommended statistical tests to assess repeatability and reproducibility must be utilized for a credible investigator report.