Introduction

Lung cancer causes more deaths than any other neoplasia in both the United States (1) and the United Kingdom (2); late detection is a major contributor to this high mortality rates (3). Bronchoscopic examination following suspicious imaging results can reveal the presence of a bronchial lesion, which is normally confirmed histologically by biopsy and/or bronchial washings (also referred to as bronchial lavage or bronchoalveolar lavage). However, a significant number of cases remain clinically occult after bronchoscopy as cytologic examination tends to miss almost half of the cases (4, 5).

The implementation of molecular biomarkers in the early diagnosis of lung cancer has been a long standing goal. Particular focus was given in identifying such biomarkers in bronchial washings in individuals with a high risk of developing lung cancer. Previous attempts in bronchial washings to detect known molecular abnormalities in lung cancer, included genomic instability (6, 7), DNA mutations (8, 9), and more recently, DNA methylation (10, 11). The latter has certain advantages about its biomarker applicability; it is a covalent DNA modification, resistant to postsampling processing and spans a significant nucleotide length, allowing for flexible assay design (12).

The feasibility of DNA methylation detection in patients with the bronchial washings of lung cancer has been shown in a number of studies (13–15; reviewed in refs. 12, 16). However, very few of the proposed biomarkers have been validated in large case–control datasets. One such validated biomarker that has recently received Conformité Européenne In Vitro Diagnostic (CE IVD) certification, under the commercial name of Epi proLung BL Reflex Assay (Epigenomics, AG) is mSHOX2 (17).

In the current study, we describe the validation of a panel of DNA methylation biomarkers in a large retrospective case–control bronchial washings set (655 individuals) from the Liverpool Lung Project (LLP; 18) and show a substantial gain in sensitivity of detection over standalone cytology.

Materials and Methods

Study design

A brief outline of the study development is shown in Fig. 1. The study extends over biomarker development phases 1 and 2, based on the Early Detection Research Network (EDRN) guidelines (19). The promoter targets (p16, RASSF1, TMEFF2, TERT, CYGB, RARB, DAPK1, p73, WT1, and CDH13) were identified from previous work of our group (18, 20–23) and others (24–29) and validated by pyrosequencing in an independent set of 48 primary non–small cell lung cancer (NSCLC) surgical tissues (Supplementary Table S1). Quantitative methylation PCR (qMSP) assays were developed for these 10 markers to screen the bronchial washings specimens. For this phase, 2 nested case–control bronchial washings sets were selected from the LLP retrospective cohort. Inclusion criteria were, specimens with 2 or more years postsampling follow-up information obtained through hospital records, the Merseyside & Cheshire Cancer Registry (MCCR), and the Office of National Statistics. Specimens were excluded if extracted DNA failed in quality control (see Quantitative methylation-specific PCR). The case–control distributions of epidemiologic and clinical characteristics for subjects in the training and test datasets are shown in Table 1 showing overall similar patterns between the 2 classes, with the exception of smoking. Samples were randomized in 96-well plates and tested in a blinded fashion.

Outline of the study progress phases. The distribution of candidate biomarker (BM) targets was validated for by PMA in an independent set of lung cancer tissues. qMSP assays were developed and evaluated for their robustness in clinical samples. These were used to screen the training bronchial washings set from patients with lung cancer and age/sex-matched controls. Samples were excluded if extracted DNA failed in quality control. Statistical modeling showed 6 markers with higher discriminating efficiency and these were used to screen the validation bronchial washings set. Furthermore, statistical modeling was applied to test the derived algorithms in the validation set. The qualifying 4-marker panel incorporated cytologic data to construct the final algorithm. BSR, best subset regression; UAT, univariate association test.

Study size and power calculations

Power calculations were based on the target methylation frequencies found in the validation lung cancer tissue set (Supplementary Table S2). Assuming a minimum of 87% positives for at least 2 markers [null hypothesis, true positive rate (TPR)0 = 0.87] and an anticipated sensitivity of 95% for the markers combination (alternative hypothesis, TPR1 = 0.95), we deduce power associated with different sample sizes, case–control ratios, and acceptable false-positive rates in a simulation study (30) as shown in Supplementary Table S2. This indicated that a set of 200 or more cases is required in a 1:1 ratio with controls to achieve 86% power for a 5% false-positive rate at the 95% confidence level.

Patients, samples, and DNA

The 2 study sets comprised a total of 655 individuals (333 lung cancer cases/322 age/sex-matched controls; Fig. 1). Patients had been retrospectively recruited through the Liverpool Heart & Chest Hospital under the LLP umbrella. All patients were referred to the bronchoscopy clinics with a clinical suspicion of lung cancer. At the end of the clinical work up, the diagnoses for the majority of patients with nonlung cancer were, bronchitis, chronic obstruction pulmonary disease (COPD), bronchiectasis, and chest infections, whereas at lower frequency heart conditions, sarcoidosis and asbestosis was diagnosed. It has to be noted that 36 individuals in the control group(s) had other (nonlung) cancers diagnosed, such as colon, breast, prostate, skin, esophagus, and oral as well as 4 mesotheliomas. The LLP has received ethical approval and all the recruited patients provided informed consent.

DNA from frozen lung tumor and paired normal tissue was extracted as previously described (22). Bronchial washings were stored in Saccomanno's fixative in an air-conditioned (18°C) room and the specimens' cytologic adequacy was judged by the presence of alveolar macrophages. DNA was extracted using the Blood and Tissue Kit (Qiagen), quantified using Picogreen (Invitrogen) and up to 1 μg DNA was bisufite converted using the EZ-96 DNA Methylation-Gold Kit (ZymoResearch).

Pyrosequencing methylation analysis

Pyrosequencing methylation analysis (PMA) assays were designed for early validation of targets in the lung tumor solid tissue. Standard protocols that have been previously described (22, 23) were used. The primers for the pyrosequencing analysis are provided in Supplementary Table S3.

The sensitivity/specificity of the assays was tested on serial dilutions of artificially (SssI) methylated DNA in lymphocyte DNA. In addition, whole genome amplified (WGA) DNA was constructed unsing the RepliG Screening Kit (Qiagen) as an absolute unmethylated DNA standard. Following multiple repetitions the sensitivity threshold was selected to 0.5% (1:200) as it provided total reproducibility, whereas higher dilutions (0.1%) proved less reliable. A methylation-independent assay with non-CpG bearing primers/probe was designed for the ACTB gene to normalize for input DNA but also to be used as an exclusion criterion. We experimentally established that a cycle threshold (Ct) = 29 for ACTB assays corresponded 6.9 ng DNA (1,000 diploid genomes). This cut-off was used to ensure 5× genome coverage at the 1:200 sensitivity threshold.

The training set was screened with CYGB, p16, RASSF1, TERT, CDH13, TMEFF2, p73, DAPK1, RARβ, and WT1. Following statistical analysis, CYGB, p16, RASSF1, TERT, RARβ, and WT1, which showed the highest independent sensitivity/specificity or selected by various multivariate models, were evaluated in the independent validation set.

Statistical analysis

Exploratory univariate analysis.

The distribution of subjects' epidemiologic, clinical, and methylation characteristics was described separately for training and testing datasets. Categorical characteristics were compared between cases and controls using χ2 test and Fisher exact test when less than 5 individuals were observed. Student t test was used to investigate statistical significant case–control difference in quantitative characteristics. The Mann–Whitney nonparametric test was used where normality assumption failed.

Identification of optimum markers.

Univariate exploratory analysis was used to provide insight into the marginal effect of each marker on subject status. The best-generalized linear model (best GLM) was used to identify the best additive logit combination mostly predictive of subject status. The model was fitted using Akaike information criterion (AIC), Bayesian information criterion (BIC, BICq), and cross-validation (CV) as selection methods. Multifactor dimensionality reduction (MDR) was used to investigate nonadditive combination of the markers, which provides an assessment of epistasis (nonlinear interactions) among the markers (31). The significance of the association between subject's disease status and each marker interaction was tested on the basis of the model-based MDR permutation test (32).

Model-based logit algorithms were derived in the training dataset for discrimination and prediction of subject status and validated in the testing dataset. These were done separately for (i) the top 6 markers from the univariate analysis, (ii) markers selected from the overall best subset GLM, and (iii) markers from the overall best MDR combination. Cytology was added as an additional factor to the best discriminatory/predictive model.

The predictive performance of each algorithm was evaluated in the test data. The disease probability (ranging from 0 to 1) was used to classify (training subjects) or predict (test subjects) as cases for P ≥ 0.5 or controls otherwise. The classification and predictive accuracies were assessed using diagnostic measures including accuracy, sensitivity, and specificity. The area under ROC curve (AUC) was used to summarize performance over the range of predicted probabilities. The overall performance of the best discriminatory model and its extended version that incorporates cytology was evaluated in a combined training and testing data, stratified by epidemiologic and clinical factors, such as age, gender, smoking status, lung cancer histologic subtype, and time distance from specimen collection to diagnosis. The independent ROC–AUCs from the stratified analyses were compared using the DeLong test (33) extended for unpaired ROC curves.

Results

Diagnostic efficiency of the DNA methylation panel

Pyrosequencing methylation analysis (PMA) of the set of 48 surgical NSCLC specimens resulted in a set of 10 promoters (CYGB, p16, RASSF1, TERT, CDH13, TMEFF2, p73, DAPK1, RARβ, and WT1) showing high frequency of methylation in tumor tissue and the absence of methylation in the normal adjacent counterpart (Supplementary Table S1). The training bronchial washings case–control set was subsequently screened with the developed qMSP assays. Three statistical models [univariate association test, marker combination by best subset regression, and markers combination by MDR] were tested to identify the optimal marker panel(s) and algorithm(s) for improved diagnostic efficiency. Univariate analysis of the 10 examined markers is presented in Table 2. All 3 models pointed to 6 markers (CYGB, p16, RASSF1, TERT, RARβ, and WT1), which were subsequently used to screen the validation set (Supplementary Table S5).

Univariate association tests for the examined biomarkers in the training and validation bronchial washings sets

The performance of the different discriminatory algorithms in training and validation data is shown in Table 3. All the logit discriminatory algorithms conducted well in the training set. The performance of the top 6 univariate markers and the best subset with BICq or CV in the test data was similar, although the best subset algorithm was more sensitive but less specific in the training data. The MDR algorithm was slightly more specific but less sensitive than the best subset model with BICq or CV criteria in the training data, its performance in the test data was only similar to that of the best subset in terms of specificity (sp = 0.98) and lower with regards to sensitivity (SE = 0.77). The addition of the top MB-MDR 2- and 3-way interactions into any of the best logit models did not alter their performances (data not shown). Overall, the best subset logit model with BICq or CV criteria including TERT, WT1, p16, and RASSF1, is the most parsimonious and best-conducted algorithm, which was then applied in the validation set.

Evaluation of classification and predictive accuracies of discriminatory algorithms in training and testing dataset

The diagnostic efficiency of this algorithm in the validation set is depicted in Table 4. The sensitivity of the panel was higher (90%) in the cytology-positive cases than the cytology-negative ones (75%). Overall, the sensitivity was 82%, whereas the specificity is very high (91%). Therefore, the panel classified correctly 213/248 individuals of the validation set (diagnostic accuracy = 85.9%). When including the cytology result into the model the sensitivity (82%) specificity (92%), and diagnostic accuracy (86.3%) remained similar. However, the diagnostic efficiency of the methylation panel is profoundly higher in comparison with the cytologic evaluation alone, which shows 45% sensitivity and 99% specificity.

Validation of the best subset logit model in the bronchial washings validation set. Comparative efficiency of the models including DNA methylation (p16, RASSF1, WT1, TERT) only and DNA methylation with incorporated cytology versus cytology only

Overall performance of the panel in clinical subgroups

Following the validation of the 4-gene panel signature in the test set, an overall performance analysis of this panel was undertaken including both sets. The purpose of this was to identify possible biases in diverse epidemiologic and clinical subgroups. Table 5 shows the details of this analysis. The model conducted equally among different age and gender groups. In addition, no differences in the sensitivity and specificity of detection were observed in relation to the age of the specimen in storage. Most importantly no significant sensitivity/specificity differences were observed among different smoking groups. Interestingly, the specificity of the panel was similar to the control subgroup bearing no malignant tumors at all (82.1%) and the control subgroup bearing tumors in other organs of the body except lung (83.3%). As expected, the sensitivity of the methylation panel was higher in cytology-positive bronchial washings. Methylation positives were found in 140/189 (74.1%) cytology-negative samples, 27/31 (87.1%) samples with suspicious cytology and 103/113 (91.1%) samples with a lung cancer cytologic diagnosis (χ2 test, P = 0.001). It was also evident that stage T1 tumors were less detectable (63%) than T2, T3, and T4 (more than 80%), whereas no such difference was seen for nodal metastasis. When comparing sensitivities of cytology and DNA methylation in diverse pT groups (Fig. 2), 2 points become obvious; (i) DNA methylation sensitivity is consistently higher in all groups than that of cytology and (ii) that cytology shows higher sensitivity in T4s as opposed to T1, T2, and T3 (χ2 test, P = 0.002), whereas DNA methylation has equally high sensitivity in T2, T3, and T4 compared with T1 (χ2 test, P = 0.004). About histology, the panel showed a higher efficiency in detecting small cell (100%) and squamous cell (83%) lung tumors in comparison with adenocarcinomas (75%). Our cohort also included a few inoperable cases with unconfirmed pathology thus such analysis was not applicable to these specimens.

Overall performance of best discriminatory algorithm by epidemiologic and clinical characteristics in both training and validation sets

Two hundred and thirty six samples included in this study (116 lung cancer cases, 120 controls) were also included in our previous collaborative studies for mSHOX2 methylation validation (34, 17). mSHOX2 positivity in the LLP methylation panel subgroups is provided in Supplementary Table S7. In this specific set of overlapping samples, mSHOX2 showed 65.5% sensitivity and 96.5% specificity, whereas the LLP panel including cytology showed 90% sensitivity and 86.2% specificity (Supplementary Table S8). The combination of the 2 (score positive samples with at least 1 of 2) did not result in improving detection parameters.

Discussion

The late diagnosis of lung cancer remains the major reason for the large number of deaths due to this disease. Earlier diagnosis with successful surgical intervention is currently the best way forward. The advent of early detection through computed tomography (CT) screening holds future promise but still has to be implemented (35, 36). Cytologic diagnosis of the disease remains one of the major investigative tools, but unfortunately it can miss up to half of the lung cancer cases. Thus, the diagnostic efficiency in cytologically occult bronchoscopic material is essential. Despite the number of articles suggesting potential biomarkers, very few have progressed to the next level toward clinical evaluation. The main reasons include low study size and thus statistical power, extensive diversity of methods, and lack of assay optimization to reach clinical standards (12).

In this study, we undertook a retrospective case–control design to evaluate DNA methylation biomarkers using a training and a validation sample set (overall 655 individuals) from the LLP. The study was designed to maximize compliance to the EDRN guidelines (19, 37), the Cancer Research UK Diagnostic Biomarker Roadmap (38), and STAndards for the Reporting of Diagnostic accuracy studies (STARD) (39) recommendations for reporting in this article. We developed very robust qMSP assays and established sensitivity and specificity through thousands of repetitions. qMSP is currently considered the gold standard method for reliably detecting DNA methylation in high dilution (40, 41). It must be noted that white blood cells, which are frequently present in bronchial washings, are not de facto methylation-free for all genes. Thus, we determined a positive control-based cut-off (0.5% methylation dilution), which was always at least 4-fold higher (>2 ΔCt) from the lymphocyte methylation signal. We have also used a methylation-independent assay for the ACTB gene to quantify the DNA input and thus (i) be used as an exclusion criterion, indicative of inadequate amount of DNA and (ii) provide normalization for the target gene signal.

Our biomarker qualification process through training and validation sets showed that a panel of TERT, WT1, p16, and RASSF1 methylation markers provides a parsimonious and efficient algorithm for correctly predicting lung cancer status in 85.9% of tested bronchial washings specimens. We used 3 different models to identify a useful marker panel and develop the discriminatory/predictive algorithm using them. The consistency of various analyses conducted supports the usefulness of the markers, providing further support to previous suggestions on the use of marker panels than single markers to improve sensitivity and specificity (42, 43).

RASSF1 methylation in bronchial washings has been recently shown to increase diagnostic sensitivity (41). Our study is also in agreement with a previous report on p16 and RASSF1 and RARβ methylation specificity in cancer cases (although RARβ was not eventually included in the final panel; ref. 44). However, CDH13 seems as a cancer-specific marker in the latter, whereas in our study had clearly no discrimination efficiency. The methodologic approach (endpoint MSP vs. qMSP) may be a source of this difference.

It is apparent that the DNA methylation panel reported in this article has superior sensitivity (82%) compared with cytology alone (45%), whereas its specificity is marginally lower (92% compared with 99% of cytology only). Cytology is currently the clinical gold standard for bronchial washings evaluation, but it is known to have a low sensitivity of detection (4, 5). Therefore, the use of DNA methylation biomarkers can be used in a clinical setting to improve the diagnostic efficiency for lung cancer. The incorporation of cytology into the model did not alter the diagnostic efficiency in our validation set. A larger cohort of specimens is currently being recruited in the LLP in an effort to confirm whether this DNA methylation marker panel can substitute or complement the cytologic report in bronchial washings for lung cancer diagnosis. In any case, the diagnostic benefit of this panel in cytologically occult specimens is profound. However, the cost/gain ratio of the clinical benefit is yet to be evaluated prospectively. It must be determined if the significant increase in detection sensitivity, and thus expected improvement of patient survival can outweigh the drop in specificity as the latter may have significant ethical and health economics implications. The major clinical need at this time in the arena of lung cancer CT screening programs, is the issue of indeterminate nodules, which require repeat CT scans (i.e., which nodules require surgical resection). The implementation of CT screening in the USA and Europe will be funded by differing health care systems, however, initial data indicate that the cost could be in the order of $18,000 per life-year saved ($2012; ref. 45). The real advancement in the implementation planning of future CT screening programs on an international level will be the reduction in the number of repeat CT scans for such indeterminate CT-identified nodules, which was one of the major recommendations from the International Association for the Study of Lung Cancer (IASLC) CT Screening Workshop Report 2011 (36). Thus, the use of a validated biomarker assay in this clinical setting it urgently needed. The methylation study reported in this article has validated a set of methylation biomarkers with a good AUC value, however, the individuals in this study did not have CT scan data and bronchial washings are not routinely used within lung cancer clinical work-up, across all clinical settings internationally. The validation of methylation biomarkers in sputum or in plasma samples in a CT-screened population is urgently required.

There are a number of studies showing the feasibility of DNA methylation detection in plasma and serum of patients with lung cancer (reviewed in ref. 16). These involve a variety of genes examined and report diverse sensitivities and specificities. Two of the genes included in our 4-gene panel (p16 and RASSF1) are also used in some of the earlier studies, whereas WT1 and TERT have not been examined. The diagnostic efficiency of the LLP methylation panel in this set of bronchial washings is generally superior to those reported in plasma and serum studies. The main reason for this is certainly the sample origin, as bronchial washings have the advantage directly representing the lung epithelium. Plasma and serum contain the seeded cancer DNA in high dilution but also represent the whole body. Thus, neoplastic or preneoplastic foci in other than lung organs may contribute to lower specificity of lung cancer detection. The diagnostic efficiency of our panel in plasma/serum is yet to be investigated in our prospective trials involving individuals with CT information from the UK Lung Cancer Screening (UKLS) study (46).

It is important that this panel showed no biases related to age and gender. Most importantly, its diagnostic efficiency is independent of the smoking status, suggesting that it detects cancer-specific alterations rather than tobacco-related field cancerization. It is also of note that correct classification was not influenced by the presence of other (nonlung) cancers in the control population. As RASSF1, TERT, WT1, and p16 are common epigenetic players in cancer development, this has to do with the origin of the specimen (i.e., the lung) rather than the specificity of these 4 markers to lung cancer.

The better performance in central (small cell and squamous carcinomas) rather than peripheral adenocarcinomas was not surprising as bronchoscopy is expected to sample the latter at lower efficiency. It is of note that while our initial marker selection was based on adenocarcinoma and squamous carcinoma tissue only, their performance in the bronchial washings from patients with other histologic subtypes (e.g., small cell, carcinoids, etc.) was equally efficient. It is also of no surprise that lower sensitivity was achieved in smaller (T1) tumors, as these presumable seed less cells in the lung cavity. It is still important though that DNA methylation detected more than half of T1 tumors, group in which cytology has particularly low sensitivity.

The combination of mSHOX2 and LLP panel results, scoring positives on the basis of either LLP or mSHOX2 positive, did not result in improving detection parameters in the common set of 236 samples. Of course, the limited data of the overlapping samples did not allow for appropriate data modeling and recalibration of the LLP algorithm. Such modeling requires synchronous screening of a larger number of individuals, as is planned for the prospective trial that is currently being set up.

Although the current sensitivity can be improved by expanding the existing panel, it is still almost double of the current gold standard, which is cytology. Thus clinical implementation could proceed provided that the diagnostic efficiency reported here is further validated in an independent cohort; preferably a multisite case–control study should be undertaken. One of the main problems seems to be the potential shortage of DNA from bronchial washings if higher numbers of markers need to be included. This can be overcome by the use of microfluidic PCR arrays that significantly reduce reaction volumes and thus required input DNA.

In this study, we used a training and a validation cohort to identify a panel of DNA methylation-based biomarkers with potential diagnostic use for lung cancer detection in bronchial washings specimens. This 4-marker panel significantly improves the diagnosis rate compared with cytologic evaluation only clearly showing that DNA methylation biomarkers can become a useful clinical tool for the diagnosis of lung cancer, especially in cytologically occult bronchoscopic material. However, the timely delivery of such molecular diagnostic tools can only be accomplished through consortia, which share samples and information and use common methodologies throughout the diagnostic process from sampling to reporting.

Disclosure of Potential Conflicts of Interest

J.R. Gosney has honoraria from Speakers Bureau and is a consultant/advisory board member of Eli Lilly, AstraZeneca, and Pfizer. No potential conflicts of interest were disclosed by the other authors.

Grant Support

This research was supported by Cancer Research UK (Grant ref: C1340/A12091) and the Roy Castle Lung Cancer Foundation, UK.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.