Molecular profiling is becoming increasingly important as a tool for the discovery and implementation of novel biomarkers for use in drug development. Modern profiling methods permit comprehensive analysis of biological systems that allow, for the first time, non-hypothesis-driven approaches in biological discovery.

Previous methods involving the study of one or a few genes or proteins at a time only permitted discrete hypothesis-driven research to confirm the role of each gene or protein in a selected biological system. In contrast, molecular profiling methods provide parallel analyses of thousands of genes or proteins for non-hypothesis-driven scanning of biological systems and derivation of novel associations and conclusions which can then be validated in successive independent analyses.Molecular profiling includes a variety of technologies that provide comprehensive screening of all levels of the biological paradigm from DNA to RNA to protein. While molecular profiling methods are becoming more comprehensive and less expensive, their coverage, ease of use and overall costs still differ dramatically. RNA profiling using GeneChips or microarrays currently provides complete coverage of the transcriptome in a single experiment. However, as is the case for all analytical techniques, RNA profiling is limited by its dynamic range. Thus, while all known genes can be represented on a single array, relevant expression changes which occur below a given technology’s limits of detection will be missed.

In contrast to the transcriptome, the proteome has been a relatively neglected source of novel biomarkers. Mass spectrometry- based methods for screening peptides or proteins are theoretically capable of screening the entire proteome in a single experiment. However, these methods are limited by technology (sensitivity and robustness of available platforms) and biology (dynamic range of protein expression, mass differences between proteins and over representation of specific proteins). Consequently, protein profiling typically requires affinity or sized-based separation prior to individual analyses of discrete fractions. Even with these difficulties, modern mass spectrometric methods are improving in sensitivity and provide very high mass accuracy. This permits analysis of several thousand proteins or peptides in discrete fractions which cumulatively account for much of the proteome of any given system. Proteomics offers several distinct advantages for biomarker development when compared to the more mature mRNA profiling technologies. These include the ability to measure post-translational modifications (invisible to transcription profiling), ability to screen non-nucleated cells (eg platelets) and biological fluids (eg plasma), and the greater ease of developing ELISA, IHC and other proteinbased assays using well established and clinically accepted methods and platforms.

Protein or mRNA profiling are, in essence, pattern recognition processes. Data mining, using both supervised and unsupervised clustering methods, is used to identify candidate biomarkers associated with the desired biological or clinical endpoints. In the case of supervised clustering, the molecular profile is searched for specific patterns that correlate with the occurrence of the desired endpoint. In unsupervised clustering, there is no pre-existing hypothesis and clusters are developed by looking for underlying patterns existing in the data, and then associating these patterns with known endpoints. In either case, a biological hypothesis is developed that requires prospective testing in an independent set of samples. Over the last few years molecular profiling studies have been utilised in many therapeutic areas. Transcriptional profiling, in particular, has been widely applied in oncology and expression profiles correlated with disease outcome in colorectal cancer1,2, prostate cancer3-6, chronic myelogenous leukemia7,8 and several other tumours have become common. Other reports have identified specific expression profiles predictive of response to drugs such as Taxol9,10 and other chemotherapeutic agents. In contrast, there have been fewer proteomic analyses of tumours with the goal of developing predictive markers and these efforts have mostly focused on early detection of disease by analysis of plasma samples. However, a recent PubMed literature search of the phrase ‘clinical proteomics’ yields around 450 research and review articles. Of these about 170 research articles and reviews were published in 2004 alone. These publications indicate that protein profiling and clinical proteomics are increasingly important complements to the more mature genomic profiling platforms. Recent protein profiling studies have reported cancer specific profiles in plasma samples from patients with ovarian11 and prostate12,13 cancer, but as yet there are few replicated proteomic profiles associated with specific drug responses. In this review, we discuss the opportunities provided by proteomic analyses and describe how protein profiling approaches contribute to the development of biomarkers for prediction of drug efficacy or toxicity.

Protein profiling strategies Protein profiling strategies can be classified into two basic categories: ‘top-down’ and ‘bottom-up’ approaches (see Figure 1). The ‘top-down’ approach involves generating profiles of intact proteins isolated from various biological samples. In contrast, the ‘bottom-up’ approach requires initial fractionation of a proteome by enzymatic digestion and chromatography, followed by mass spectrometric analyses of the digested protein fragments (peptides).

Protein antibody arrays are being recognised as a viable ‘top-down’ protein profiling platform and several protein antibody array systems are commercially available16-18. An issue for current protein arrays is the lack of content. Unlike gene array platforms which can, in principle, capture all known gene sequences, many of the protein array platforms rely on the availability of robust antibodies for detection of various proteins of interest. Antibody reagents are difficult to prepare and expensive. As such, robust detection reagents are not yet available for many proteins of interest. Indeed, many current arrays tend to be focused toward circulating factors which are of primary interest to the immunology research community. While the content of antibody arrays will no doubt improve with time, at present their utility is limited to specific disease areas of interest.

In contrast to ‘top-down’ strategies, the ‘bottom- up’ protein profiling approaches are based on peptide profiling methods which theoretically offer an ability to achieve comprehensive profiles of a given proteome. Currently, the ‘bottom-up’ approaches are liquid chromatography fractionations coupled to mass spectrometry (LC/MS) based techniques. The MUlti-Dimensional Protein Identification Technique (MUDPIT), developed by Yates and co-workers16-18, is an LC/LC-MS/MSbased technique that has been widely utilised for profiling complex protein mixtures. While MUDPIT is a very useful qualitative profiling tool, this approach is not a robust method for quantitation of specific proteins. In order to exploit the quantitative aspects of ‘bottom-up’ profiling, techniques such ICAT (isotope coded affinity tags), cleavable ICAT, and GIST (global internal standard technology) were developed. These methods have been successfully implemented to perform relative protein quantifications in in-vitro and pre-clinical settings21- 24. In general, however, the isotope tagging methodologies have significant drawbacks to implementation including difficulties in sample processing, sample loss and high background due to non-specific capture. As such, these technologies have not been widely implemented for profiling biological samples obtained in clinical settings. A novel method called ‘Peptide Ion Mapping’, a ‘bottom-up’ method based on reversed-phase liquid- chromatography and mass spectrometry (RPLC/MS), has been developed both in our protein profiling laboratory and by others interested in proteome profiling25,26. Peptide ion mapping provides a rigorous relative quantitative protein expression profiling for biological samples such as serum, plasma, CSF, urine and tissues obtained in both pre-clinical and clinical settings (Figure 2).

Despite the multitude of protein profiling advances, at present no single methodology allows for simultaneous qualitative and quantitative comprehensive protein expression profiling. Given this state of the art, protein profiling laboratories must rely on a proteomics ‘tool chest’ comprised of methodologies, technologies and techniques derived from both the ‘top-down’ as well as ‘bottom- up’ approaches.

Applications in drug discovery and development Protein profiling approaches are rapidly gaining acceptance in clinical research as well as the drug discovery and development community. As with any new technology, the best application of protein profiling is widely debated. The debate is primarily centred around the robustness of the various proteomics technologies and the impact that these technologies will have on both longand short-term decision making in drug development. The majority of the protein profiling studies performed in recent years have centred on generating plasma and tissue profiles to differentiate between pathophysiologic states. These patterns clearly have diagnostic utility. For example, Yanagisawa and colleagues recently reported on the use of MALDI profiling of tumour tissue samples to predict survival in patients with non-small-cell lung cancer (NSCLC)27. While the oncology community was the first to embrace various protein profiling platforms, other therapeutics areas are not far behind in applying these technologies. Indeed, recent exciting work has been published on protein profiling of CSF to study the pathogenesis of Alzheimer’s disease (AD)28, profiling of cardiac myocytes to identify candidate markers to monitor risk of heart disease29, profiling of human carotid atherosclerotic plaques30, and profiling of plasma for candidate markers associated with Rheumatoid arthritis31.

To illustrate the potential impact of proteomics in drug development, we share here some of our recent experience with the peptide ion mapping approach. These experiments describe one implementation of the bottom-up approach seeking to identify candidate markers for prediction of a therapeutic response. This approach is particularly useful in the oncology area where therapeutic response rates are typically 20% and predicting drug response is critical for successful therapy. An experiment was designed to determine putative biomarkers indicative of drug sensitivity using a pre-clinical xenograft model with two resistant and two sensitive cell lines: one each from lung and colon tumours. Plasma and tumour tissue were analysed by peptide ion mapping and a suite of proprietary algorithms were used for data analysis. The output of a typical ion mapping experiment is a list of peptide ion clusters along with their associated intensity information for all samples in the study. This list forms the basis of all subsequent statistical analyses.

In our implementation of peptide ion mapping, a probability is computed for each peptide ion cluster to test the null hypothesis that there is no peptide ion intensity difference (expression) between experimental samples. Ions demonstrating a statistically significant difference in expression are then subjected to unsupervised hierarchichal clustering. The results of this analysis can be displayed as a heat map, similar to the common method of representing gene array profiling data. Figure 3 shows the results of the example xenograft experiment. Figure 3a shows peptide ions derived from xenograft plasma that are correlated to sensitivity of a specific oncology therapeutic agent. Similarly, Figure 3b shows peptide ions derived from xenograft tumour tissue that are correlated to sensitivity of a specific oncology therapeutic agent. As shown in Figure 3a and 3b there is a clear differentiation between resistance and sensitivity despite the fact that the origin of the tumours are different cell lines derived from lung and colon samples. These unique peptides are subsequently sequenced using tandem mass spectrometry to identify the proteins from which the peptides originated. These proteins can then be further evaluated as candidate markers for prediction of therapeutic response using routine diagnostic techniques such as ELISA, IHC assays and western blotting.

This study demonstrates how protein profiling can be used to generate novel biomarker candidates for further evaluation. For illustration purposes, we have shown one implementation of a ‘bottom-up’ LC/MS-based protein profiling platform. As described above, a comprehensive protein profiling approach to biomarkers requires additional technologies to drive predictive marker discovery studies in both pre-clinical models and in human samples of clinical interest.

Summary and future directions The integration of protein profiling tools into the overall process of biomarker discovery will facilitate drug development and is of critical importance for organisations seeking a comprehensive approach to biomarker development. The ongoing evolution of proteomic technologies mirrors the recent development of tools for investigation of nucleic acids. While nucleic acid-based technologies utilised for genotyping, DNA sequencing and RNA expression analysis continue to be improved, the basic technologies are now relatively mature and simple to implement into discovery efforts. By contrast, proteomics is at an earlier stage of development. As such, implementation of protein profiling today requires highly trained investigators and we are currently witnessing significant improvements in instrumentation and in methods for sample collection, processing and analysis.

The potential impact of proteomics to improve decision making in drug discovery is immense despite the difficulties of applying rapidly evolving technologies. In this short review we have provided one example of a proteomic based biomarker discovery programme. Our experience is closely mirrored by that of colleagues performing similar experiments in a variety biological systems. The rapid improvement of profiling methods and underlying technologies along with recent profiling successes clearly indicates the excitement and potential of these approaches to address clinically important issues. As indicated, protein profiling is one component of a comprehensive approach to understanding complex biological problems. In the near term we will see the full integration and synergistic application of DNA, RNA and protein analytical techniques to these problems. Ultimately these technologies will lead to improved biomarker sets with improved predictive potentials.

Dr Ashok Dongre is Associate Director in the Clinical Discovery Technologies group at Bristol- Myers Squibb and directs the Protein Profiling group. He earned his BSc and MSc degrees in Chemistry/Physical Chemistry from the University of Bombay (India) and earned his PhD in Analytical Chemistry (Biological Mass Spectrometry) at Virginia Commonwealth University. He completed post-doctoral research as a Howard Hughes Fellow in the Department of Immunology and the Department of Molecular Biotechnology at University of Washington. Dr Dongre is an expert in the area of proteomics/protein profiling and has authored >25 scientific publications.

Dr Douglas Robinson is a Research Investigator in the Clinical Discovery Technologies group at Bristol-Myers Squibb. He earned his BS and MS in mathematics from the University of Massachusetts the University of Vermont, respectively, and his PhD in Bioinformatics at North Carolina State University. Subsequently, he came to Bristol-Myers Squibb where he directs statistical analyses for the protein profiling group. Dr Robinson is an expert in statistics with a focus on analysis of large data sets generated by diverse genomic and proteomic technologies.

Dr Mark Curran is a Director of Clinical Discovery Technologies at Bristol-Myers Squibb with responsibility for biomarker development in the cardiovascular, immunology and neurology therapeutic areas and he also directs the protein profiling core facility. He earned BS and MS degrees in Biology/Biotechnology from Worcester Polytechnic Institute and performed PhD and postdoctoral studies at the University of Utah in the departments of Human Genetics and Cardiology. Dr Curran is an expert on the heritable basis of cardiac arrhythmia and has authored >40 scientific publications.

Dr Nicholas Dracopoli is Vice-President of Clinical Discovery Technologies at Bristol-Myers Squibb. He obtained his BSc and PhD degrees from the University of London and completed post doctoral fellowships at the Memorial Sloan- Kettering Cancer Center and the Massachusetts Institute of Technology. Subsequently, he has worked at the Whitehead/MIT Genome Center and the National Center for Human Genome Research at the NIH before moving to the biotechnology industry. Dr Dracopoli has extensive experience in the fields of genomics, molecular biology and cancer research and has authored >70 scientific publications.