Transcription

1 Vol. 10, , February 1, 2004 Clinical Cancer Research 849 Featured Article Gene Expression Profiles Predict Survival and Progression of Pleural Mesothelioma Harvey I. Pass, 1 Zhandong Liu, 2 Anil Wali, 1 Raphael Bueno, 6 Susan Land, 3 Daniel Lott, 3 Fauzia Siddiq, 1 Fulvio Lonardo, 4 Michele Carbone, 5 and Sorin Draghici 2 1 Thoracic Oncology, Karmanos Cancer Institute, 2 Department of Computer Science, and 3 Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan; 4 Department of Pathology, Harper University Hospital, Detroit, Michigan; 5 Department of Pathology, Cardinal Bernardin Cancer Center, Loyola University, Maywood, Illinois; and 6 Division of Thoracic Surgery, Brigham and Women s Hospital, Harvard Medical School, Boston, Massachusetts Abstract Purpose: Clinical outcomes for malignant pleural mesothelioma (MPM) patients having surgery are imprecisely predicted by histopathology and intraoperative staging. We hypothesized that gene expression profiles could predict time to progression and survival in surgically cytoreduced pleural mesothelioma of all stages. Experimental Design: Gene expression analyses from 21 MPM patients having cytoreductions and identical postoperative adjuvant therapy were performed using the U95 Affymetrix gene chip. Using both dchip and SAM, neural networks constructed a common 27 gene classifier, which was associated with either the high-risk and low-risk group of patients. Data were validated using real-time PCR and immunohistochemical staining. The 27 gene classifier was also used for validation in a separate set of 17 MPM patients from another institution. Results: The groups predicted by the gene classifier recapitulated the actual time to progression and survival of the test set with 95.2% accuracy using 10-fold cross-validation. Clinical outcomes were independent of histology, and heterogeneity of progression and survival in early stage patients was defined by the classifier. The gene classifier had a 76% accuracy in the separate validation set of MPMs. Conclusions: These data suggest that pretherapy gene expression analysis of mesothelioma biopsies may predict which patients may benefit from a surgical approach. Received 4/21/03; revised 10/12/03; accepted 10/14/03. Grant support: Early Detection Research Network, National Cancer Institute/NIH U01 CA , by Merit Review Funding from the Department of Veterans Affairs, and by a grant from the Mesothelioma Applied Research Foundation (R. B.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Requests for reprints: Harvey I. Pass, Harper University Hospital, 3990 John R, Suite 2102, Detroit, MI Phone: (313) ; Fax: (313) ; Introduction Malignant pleural mesothelioma (MPM) is an orphan disease of which the management has defied curative options (1). The incidence of MPM worldwide is increasing, and the estimated number of deaths over the next 30 years from MPM in Western Europe alone could approach 250,000 (2). Our understanding of the pathophysiology of the disease, however, has improved with the advent of molecular investigations of the downstream effects of asbestos (3), as well as the realization that SV40 is linked with the disease in 60% of the cases (4). Studies of the molecular genetics of the disease have revealed the importance of signal transduction pathways including the epidermal growth factor receptor (5) and activator protein (6), as well as asbestos-sv40 interactions, which could explain such phenomena as increased angiogenesis (7), telomerase (8, 9), MET overexpression (10, 11), abrogation of p53-induced apoptosis (12), and abnormalities of immune surveillance (13) among other findings. There have been few global investigations using gene array technologies of the multiple genetic events in MPM due to the lack of uniformly treated patients with carefully documented survival and recurrence data, and the absence of large archives of banked frozen tissue with adequately preserved RNA. In addition to the gene-based classification (14), the ability to prognosticate on the basis of gene expression data from biopsy harvests could have important therapeutic implications for this group of unfortunate patients. Our hypothesis is that gene arrays performed on operative specimens from uniformly treated mesothelioma patients could predict survival and time to progression, and could cluster groups of patients as good or poor risk. Our test set population included stages I-III MPM patients resected by one surgeon. We analyzed our gene expression data using two programs, significance analysis microarrays (SAM) and dchip, to see whether concordant genes from both analyses could correctly cluster the patients into good or poor risk cohorts. Materials and Methods Patient Population. Tumor and normal tissues from 21 patients with MPM treated at the National Cancer Institute and the Karmanos Cancer Institute between August 1993 and May 2001 were used as the test set for the studies described below. Consent was received and the project approved by the local Institutional Review Boards. All of the patients had cytoreductive surgery, mediastinal lymph node dissection with staging by the International Mesothelioma Interest Group staging system (15), followed by postoperative cisplatinum-based chemotherapy. All of the patients were followed from the surgery until death or through February 2003 with computerized tomography of the chest every 3 months. Interval change, which suggested progression of the MPM, was documented whenever possible by histological confirmation of disease. Tumors were classified as having epithelial, sarcomatoid, or mixed histology MPM by two

3 Clinical Cancer Research 851 Fig. 1 A, gene expression patterns determined using hierarchical clustering of the 21 mesothelioma patients against the top 95 genes identified by dchip. Substantially elevated (red) or decreased (blue) gene expression of the genes is observed in individual tumors. B, gene expression patterns determined using hierarchical clustering of the 21 mesothelioma patients against the 27 common genes between dchip and SAM. Two groups are defined, which have contrasting gene expression patterns with the exception of 3 genes, which are overexpressed in the majority of cases (see Discussion ). Data Analysis and Neural Network Methods. T tests were used to identify differences in mean gene expression levels between comparison groups. We constructed of the classifiers using a multilayer feedforward backpropagation neural network. A three layer feed forward artificial neural network was trained based on the expression value of differentially regulated genes, and the accuracy of the neural network was confirmed with 10-fold cross-validation. The 10-fold cross-validation involves constructing 10 different classifiers in 10 training runs. Each training run divides the available data into a training set, including 90% of the data and a validation set including 10% of the data. The classifier is constructed based on the training set and tested on the validation set. Each training run leaves out different patterns such as at the end of the entire process, each

4 852 Gene Expression Prognostication for MPM Table 2 Confusion matrix results for the test set and validation set Classified by the 27 gene expression array as Good risk Poor risk Wayne/KCI test set true class Long 9 1 Short 0 11 B&W validation set true class Long 5 3 Short 1 8 available patient will have been left out at least once. The performance reported represents an average of these 10 training runs. The process ensures that the resulting classifiers are always tested on data that has never been used during the training, thus testing its generalization abilities. We calculated the intersection of the genes selected by the dchip and Sam, and constructed a similar neural network classifier based on only those genes that were selected by both methods. To inspect visually the expression profiles of the genes, a hierarchical clustering was constructed for each group of genes using average linkage and standardized Euclidean distance. The performance of each classifier was assessed in two ways. Firstly, each classifier was tested and assessed using 10-fold cross-validation, as described above. Secondly, we used the trained classifier on a previously unseen data set of 17 mesothelioma patients including 8 long-term survivors and 9 short-term survivors. These data were obtained from the Division of Thoracic Surgery of the Brigham and Women s Hospital (16). Kaplan Meier survival plots and log rank tests were used to assess differences between poor and good risk groups using MedCalc Software (Mariakerke, Belgium). Onto-Express (OE) Analysis. Using OE (17), we performed a functional analysis to identify the main biological processes and pathways involving these prognostic genes. OE is a tool designed to mine the available functional annotation data and find relevant biological processes. The input to OE is the list of GenBank accession numbers, Affymetrix probe IDs, or Uni- Gene cluster IDs. OE constructs a functional profile for each of the Gene Ontology categories (18), including cellular component, biological process, and molecular function, as well as biochemical function and cellular role, as defined by Proteome. As biological processes can be regulated within a local chromosomal region (e.g., imprinting), an additional profile is constructed for the chromosome location. Statistical significance values are calculated for each category using either a hypergeometric or binomial distribution depending on the number of genes on the array used. Results Hierarchical Profile Clustering Yields Two Mesothelioma Subsets. Using the U95 Affymetrix oligonucleotide arrays, we generated profiles for 21 MPMs, which had been defined as short-term and long-term groups based on their actual survival from operation as either or 12 months (Table 1). Median survival time for the short-term group was 3.6 months (range, 2 10 months) compared with 58 months (range, months) for the long-term group (P ). The time from surgery to radiographic progression of disease was a median of 1.8 months (range, 1 8 months) and 14.5 months (range, 6 95 months) for the short- and long-term groups, respectively (P ). Selection of Differentially Regulated Genes: dchip. The dchip software package (19) normalized the 21 Affymetrix chips by the chip with median expression value. The unsupervised clustering on all 12,625 of the genes produced a dendogram in which the gene expression profiles of the long- and short-term survivors were not consistently different. A PM/MM model-based normalization was performed to select the genes that were differentially regulated between the long- and short- Fig. 2 Validation of gene chip data with real-time PCR for IGFB5, plasmolipin, and calumenin. The same sample RNA for the array analyses were used for the real-time PCR. (See text for details).

5 Clinical Cancer Research 853 Fig. 3 Immunohistochemical analysis of integrin -6, FGF07, and NM23A comparing 1 patient from each of the two clusters. Tumor staining was more prominent in patient 4 compared with patient 39 for integrin and FGF-7 with comparable expression of NM23 in both patients. Magnification, 40. term survivor groups with the following criteria: (a) E-B 100; (b) E/B 1.2 or B/E 1.2; and (c) two group t test has a P E and B stand for the average expression value in the experiment (long-term survivors, good risk) and control (short-term survivors, poor risk) groups, respectively. Ninetyfive genes satisfied the two criteria above (Fig. 1A). These 95 genes were used to construct an artificial neural network with a architecture. This network achieved an accuracy of 90.5% measured by 10-fold cross-validation. The sensitivity and specificity of this artificial neural network classifier were 90% and 90.9%, respectively. Selection of Differentially Regulated Genes: SAM. We analyzed the same data using the Significance Analysis of Microarray (20) using a fold change of 1.2 and a delta value of Because of the sensitivity of setting parameters in SAM, we could only select 100 genes (as opposed to 95 in dchip) with a false discovery rate of 26.6%. A neural network constructed a classifier based on these SAM-selected genes with an accuracy of 100% (achieved on 10-fold cross-validation). The sensitivity and specificity of this artificial neural network classifier were 100% and 100%, respectively. Common Genes between dchip and SAM. To define which important genes were common to dchip and SAM, an unpaired two group comparison was performed on the dchip normalized data. Interestingly, the intersection of the 95 genes selected by dchip with the 100 genes selected by SAM yielded only 27 common genes. This suggested that: (a) the data were rather noisy and the subset of good predictors might be 95 genes, and (b) at least one but possibly both methods picked up a number of genes that were less than ideal predictors. To investigate these hypotheses, we built a classifier based on the genes common between the two sets (i.e., 95 dchip genes and

7 Clinical Cancer Research 855 Fig. 4 Survival of the shortterm and long-term survival groups as defined in Table 1 were significantly different (P ). A and B, survival in the 21 test mesotheliomas based on the 27 gene risk classifier. The poor-risk and good-risk groups differ significantly (P ), but there is no difference between the gene predicted goodrisk group and the actual longterm survivors (P ), or between the gene predicted poorrisk group and the actual shortterm survivors (P ); C, time to progression of the shortterm and long-term survival groups as defined in Table 1 was significantly different (P ). D, time to progression in the 21 test mesotheliomas based on the 27 gene risk classifier. The poor-risk and good-risk groups differ significantly (P ), but there is no difference in progression times between the gene predicted goodrisk group and the actual longterm survivors (P ), or between the gene predicted poorrisk group and the actual shortterm survivor progression times (P ) ), inflammatory responses (P 0.022), apoptosis induction by extracellular signals (P 0.001), DNA damage response, activation of p53 (P 0.001), and RNA processing (P 0.006). The P was calculated using a binomial distribution (21). Survival and Recurrence Prognostication: Test Set. The dendogram defined by the 27 common genes identified a poor risk and good risk group of MPMs within this test set (P ). The median survival of the actual short-term survival group was 3.6 months and that of the predicted poor risk group was 4.3 months (Fig. 4, A and B). No observed propensity for segregation of epithelial mesotheliomas into the good-risk group or other histologies into the poor-risk group was observed. Time-to-progression curves for the gene set also mirrored the test set actual clinical data (Fig. 4, C and D). There were statistical differences in survival (P ) between actual stage I (median survival not reached), stage II (median survival 10 months), and stage III (median survival 5.2 months) patients (Fig. 5, A and B). Prognostic heterogeneity, however, was detected within stages when survivals and progression were examined based on the gene classifier. When surgically staged I and II patients were grouped together, their survival was significantly different depending on which cluster the 27 gene classifier assigned them to. As seen in Fig. 5C, the patients in either stage 1 or 2 who were assigned to the poor-risk cluster were found to have significantly shorter survival than those assigned to the other cluster (P ), and the time to progression of the stage I and II patients (Fig. 5D) was 1.8 months for those assigned to the poor-risk cluster and 15 months for those in the good-risk cluster (P ). These differences were only suggestive for the stage III mesothelioma patients with regard to survival (Fig. 5, E and F). There was, however, a statistically longer time to progression for those stage III mesotheliomas who were clustered as good-risk (17 month median) compared with the poor-risk cluster (2.6 month median; P ). Validation with an Independent Set of Pleural Mesotheliomas. We assessed the performance of the 27 gene neural network classifier using oligonucleotide gene expression data obtained from a completely independent [Brigham and Women s (B&W) Hospital] sample of 17 resected MPMs, all stages I or II. First, we trained our neural network with the genes selected by dchip from our dataset, and then the neural network was used to classify patients from B&W dataset. The classification accuracy is only 52.94%. The neural network built with genes selected by SAM from our dataset also did not yield a good result with a classification accuracy of only 47.05%. The classifier built with the common genes between dchip genes and SAM genes yielded a classification accuracy of 76.47% (Table 2), and based on our 27 important

8 856 Gene Expression Prognostication for MPM Fig. 5 Gene expression in mesothelioma detects heterogeneity of natural history between patients with similar surgical stages. A, actual survival times of the mesothelioma test set by stage at surgery. There were significant differences between the stages. B, actual survival times depicted as early disease, i.e., stages 1 and 2 (n 8), compared with late disease, i.e., stage 3 (n 13). Survivals differences were significant (P ). C, staging heterogeneity defined by gene classifier. When the 27 gene classifier was examined in stage 1 and 2 patients, those patients assigned to the good-risk group (n 5) had a significantly longer survival than those assigned to the poor-risk group (n 3; P ), and the time to progression (D) was also significantly longer among stage 1 and 2 patients who were classified as good risk (P ); E and F, among stage 3 patients, significant survival differences were not seen between good-risk patients (n 3) and poor-risk patients (n 10; P ), but significant heterogeneity for time to progression was observed when genetic classification differed (P ). genes, we applied hierarchical clustering on the patients from B&W dataset. The resulting dendogram revealed two groups (Fig. 6A) from the B&W data, and the predicted survival of the B&W patients (Fig. 6C) was similar to their actual (Fig. 6B) survival (median survival, 5.5 months versus 5 months) for the poor-risk groups. Moreover, the survival differences between the gene-predicted 5 good-risk patients in the B&W and the 12 poor-risk patients approached significance (P ). Discussion Of the 27 genes found to be important in this investigation, 18 have been thoroughly classified in the literature, and few have been associated with MPM. Clusterin SP-40, acidic protein rich in leucine, and cysteine-rich protein were overexpressed in the vast majority of the MPMs, and, hypothetically, could be part of some pathway common to mesothelial carcinogenesis. Selenium binding protein was consistently overexpressed in good-risk patients, and was the only gene common to both the test set and validation set. SIVA or CD27-binding protein is part of an apoptotic pathway induced by CD27 antigen, a member of the tumor necrosis factor receptor superfamily. CD27 regulates the death and differentiation of T and B cells, and provides signals needed for the correct activation of specific T cells. Whether SIVA is important in MPMs for immune surveillance, and as such, could contribute to a longer survival, is unknown. Short chain dehydrogenase/reductase (retsdr1) reduces alltrans-retinal during bleached visual pigment regeneration (22). In the absence of retsdr1, and under low concentrations of circulating retinol, the lack of production of vitamin A active

9 Clinical Cancer Research 857 Fig. 6 Validation of gene classifier in other surgically resected mesotheliomas. A, gene expression patterns determined using hierarchical clustering of the 17 mesothelioma patients against the 27 common genes between dchip and SAM. Two groups were defined, which have contrasting gene expression patterns similar to that seen in the test set (Fig. 1B). B, the actual survival time between the short and long term surviving patients in the B&W group was significant (P ). C, the 27 gene classifier defined two groups of mesotheliomas of which the difference in survival approached significance (P ). metabolites could contribute to cancer development and progression. Retinoic acid decreases the synthesis of fibronectin and laminin, and migration of MPM implying that retinoids and their binding may decrease MPM local invasion and tumor progression (23). BTG 2 is a member of a group of structurally related antiproliferative proteins, which mediate a common signal transduction growth arrest and differentiation pathway. This pathway interacts with N-methyl transferase, the chief enzyme for post-translational modifications of proteins by protein methylation (24). BTG influences substrates of this methylation pathway including heterogeneous nuclear ribonucleoproteins (hnrnp), which was generally lower in expression in the goodrisk group. hnrnp H1 influences pre-mrna processing, and its role in cancer is unclear; however, hnrnp H1 has been identified as a protein that binds to the negative regulator spicing element of the Rous Sarcoma Virus (25, 26), and also binds a G-rich element downstream of the core SV40 late polyadenylation signal and stimulates 3 processing (27). These data suggest that hnrnp H1 could modulate late viral element activity in two known carcinogenic viruses, of which one (SV40) has now been implicated in the pathogenesis of MPM (4). Calumenin, a calcium binding protein localized in the endoplasmic reticulum involved in protein folding and sorting, was uniformly underexpressed in the good-risk group. Calumenin has been found to bind serum amyloid P (28) and is hypothesized to participate in the immunological defense system. Calumenin has not specifically been associated with MPM before, but serum amyloid A and P have been noted to be connected with MPM (29 31). Nm23a, variably expressed in the good-risk group, was uniformly overexpressed in the poor-risk group. This transcription factor associated with highly metastatic cells (32) plays a role in myc expression. It has been shown that myc can up-regulate nm23-h1 and nm23-h2 expression (33), and data are variable regarding the prognostic significance of nm23 in human neoplasms (34 39). Our data analysis is novel compared with other reports in the literature. Differences in array platforms, data analysis packages, or modifications in the filtering of the genes may result in differences in classifier genes among reports examining the same disease. We have demonstrated this phenomenon in our own work by comparing the top significant genes of interest from dchip and SAM, and found that there was not uniform agreement between the two programs on these genes, and a 10-fold cross-validation testing yielded differing results for the two programs. We feel that by selecting only those genes that were common to the two analyses, the resulting classifiers are more robust. Moreover, neither the top 27 genes from dchip or SAM could segregate survival differences among the B&W MPMs; however, the common set was 76% accurate in this independent set of specimens. It is likely that the patient population from the National Cancer Institute/Karmanos experience is different from that of the Boston experience. Our group

Major Advances in Cancer Prevention, Diagnosis and Treatment~ Why Mesothelioma Leads the Way H. Richard Alexander, Jr., M.D. Department of Surgery and The Greenebaum Cancer Center University of Maryland

Page 1 IKDT Laboratory IKDT as Service Lab (CRO) for Molecular Diagnostics IKDT lab offer is complete diagnostic service to all external customers. We could perform as well single procedures or complex

How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss

Essentials of Real Time PCR About Real-Time PCR Assays Real-time Polymerase Chain Reaction (PCR) is the ability to monitor the progress of the PCR as it occurs (i.e., in real time). Data is therefore collected

Non Small Cell Lung Cancer: Scientific Discoveries and the Pursuit of Progress Lung Cancer Accounts for 14% of All New Cancer Diagnoses in the United States 1 Lung cancer is the second most common malignancy

Department of BioScience Technology Chung Yuan Christian University 2015/08/13 Cancer Cells Cancer, the 1st leading cause of death, is an example of a disease that arises from abnormalities in cell function

Systems Biology: A Personal View XV. Network Medicine Sitabhra Sinha IMSc Chennai Diseases, Genes and Networks http://learn.genetics.utah.edu/ Now that we have the ability to sequence entire genomes, can

Fighting cancer with information Report series: General cancer information Eastern Cancer Registration and Information Centre ECRIC report series: General cancer information Cancer is a general term for

Summary. 111 Detection and staging of recurrent prostate cancer is still one of the important clinical problems in prostate cancer. A rise in PSA or biochemical recurrence (BCR) is the first sign of recurrent

Mesothelioma 1. Introduction 1.1 General Information and Aetiology Mesotheliomas are tumours that arise from the mesothelial cells of the pleura, peritoneum, pericardium or tunica vaginalis [1]. Most are

Mechanism of action The word cancer refers to a number of different diseases that share a common trait: the rapid, unrestrained growth and spread of cells that can invade and destroy surrounding tissues,

BIBM 2010 Tutorial: Epigenomics and Cancer PART 3.3: MicroRNA and Cancer Dec 18, 2010 Sun Kim at Indiana University Outline of Part 3.3 Background on microrna Role of microrna in cancer MicroRNA pathway

LESSON 3.5 WORKBOOK How do cancer cells evolve? In this unit we have learned how normal cells can be transformed so that they stop behaving as part of a tissue community and become unresponsive to regulation.

Investigating the role of a Cryptosporidium parum apyrase in infection David Riccardi and Patricio Manque Abstract This project attempted to characterize the function of a Cryptosporidium parvum apyrase

1. A recombinant DNA molecules is one that is a. produced through the process of crossing over that occurs in meiosis b. constructed from DNA from different sources c. constructed from novel combinations

Advances in Treatment of Malignant Pleural Mesothelioma: A Reason for Hope Daniel H. Sterman, M.D. Associate Professor of Medicine and Surgery Co-Director, PENN Mesothelioma and Pleural Program University

Real-Time PCR Vs. Traditional PCR Description This tutorial will discuss the evolution of traditional PCR methods towards the use of Real-Time chemistry and instrumentation for accurate quantitation. Objectives

Transcript Details This is a transcript of a continuing medical education (CME) activity accessible on the ReachMD network. Additional media formats for the activity and full activity details (including

Basics of microarrays Petter Mostad 2003 Why microarrays? Microarrays work by hybridizing strands of DNA in a sample against complementary DNA in spots on a chip. Expression analysis measure relative amounts

Microarrays And Functional Genomics CPSC265 Matt Hudson Microarray Technology Relatively young technology Usually used like a Northern blot can determine the amount of mrna for a particular gene Except

http://www.springer.com/3-540-22006-2 1 Molecular Pathology Laboratory of the Future Christopher A. Moskaluk 1.1 The Past The integration of laboratory analysis with human medicine has traditionally been

1. What is cancer? 2. What causes cancer?. What causes cancer? 3. Can cancer be prevented? The Facts One out of every two men and one out of every three women will have some type of cancer at some point

Dr. Anthony C.H. YING What are? Tumour markers are substances that can be found in the body when cancer is present. They are usually found in the blood or urine. They can be products of cancer cells or

Special report Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006 Gene And Protein The gene that causes the mutation is CCND1 and the protein NP_444284 The mutation deals with the cell

Recombinant DNA technology (genetic engineering) involves combining genes from different sources into new cells that can express the genes. Recombinant DNA technology has had-and will havemany important