Abstract

The focus of this article is to review the recent advances in proteome analysis of human body fluids, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, and amniotic fluid, as well as its applications to human disease biomarker discovery. We aim to summarize the proteomics technologies currently used for global identification and quantification of body fluid proteins, and elaborate the putative biomarkers discovered for a variety of human diseases through human body fluid proteome (HBFP) analysis. Some critical concerns and perspectives in this emerging field are also discussed. With the advances made in proteomics technologies, the impact of HBFP analysis in the search for clinically relevant disease biomarkers would be realized in the future.

1 Introduction

Proteomics is widely envisioned as a powerful means for biomedical research. With the significant advances in MS and proteomics technologies [1–3], protein biomarker discovery has become one of the central applications of proteomics. One might think tissue biopsy is the ideal specimen for disease biomarker study. However, in terms of disease diagnosis and prognosis, a human body fluid (e.g., blood, urine, or saliva) appears to be more attractive because body fluid testing provides several key advantages including low invasiveness, minimum cost, and easy sample collection and processing [4]. Analysis of human body fluid proteome (HBFP) has become one of the most promising approaches to discovery of biomarkers for human diseases.

HBFP analysis is inherently challenging because body fluids contain a large number of proteins that could be modified in a variety of forms. Due to the complexity of HBFP, a number of variables need to be considered, including sample preparation and handling, protein pre-fractionation, affinity depletion of highly abundant proteins, isolation of subproteomes (e.g., glycoproteome and phosphoproteome), multidimensional chromatographic separation, quantification of proteins, data analysis, database search criteria, etc. Understanding the nature and inherent limitation of these necessary steps will illustrate the best approach and tools for analysis of a specific body fluid proteome and discovery of corresponding disease biomarkers.

In this article, we will review the progress of HBFP analysis and its application to body fluid biomarker discovery for human diseases. Special features, challenges, and perspectives regarding body fluid proteomics will also be discussed. Knowing the current status and critical concerns of HBFP analysis will help advance future investigations in this emerging field.

2 Plasma/serum

Human plasma proteins originate from a variety of tissue and blood cells as a result of secretion or leakage. Numerous biomedical studies have demonstrated that plasma protein levels reflect human physiological or pathological states and can be used for disease diagnosis and prognosis [5, 6]. Sample preparation and handling is critical for plasma/serum proteome analysis. A plasma sample is obtained if the blood is withdrawn in the presence of an anticoagulant (EDTA, sodium citrate, or heparin) and centrifuged to remove blood cells. However, in the absence of an anticoagulant, a serum sample is obtained after the blood clots and cellular elements are centrifuged and removed. Serum protein composition largely differs from that plasma [7]. How to standardize the sample handling and preparation, and whether to use plasma or serum for proteome analysis are the questions that remain to be answered. A second critical issue is the complexity of the proteome. Plasma/serum contains a huge number of proteins differing by the extraordinary dynamic range of at least 9–10 orders of magnitude [8]. Many of these proteins are glycosylated or bound to other carrier proteins. How to globally quantify the proteins in free, bound, or modified forms remains a critical challenge.

2.1 Sample preparation and handling

Little efforts have been invested on standardizing sample preparation and handling procedures in plasma/serum proteome analysis. This is presumably because there are a wide range of preanalytical variables and also too many technology platforms currently used in the plasma/serum proteome analysis. Thus, it is not practical to define a single list of standard sample preparation and handling procedures for all the proteomics platforms [9]. However, in reality, this is extremely important during the biomarker development process, considering that a candidate biomarker needs to be repeatedly validated on large and independent sample cohorts usually in a format of multicenter studies. Several recent studies clearly indicated that sample handling variables such as clotting conditions and time, storage temperature, storage time, storage tube, freeze/thaw cycles, and protease inhibitors have significant effects on the results of plasma/serum proteome analysis. In general, the samples should be aliquoted and stored with minimization of thaw/refreeze cycles, preferably in frozen liquid nitrogen. Protease inhibitors would be desirable but inhibitor cocktails should be used with caution. Peptide inhibitors, e.g., high-concentration aprotinin, may interfere with MS analysis and several small molecule inhibitors such as PMSF and AEBSF form covalent bond with proteins [9–13]. In the process of whole blood coagulation, the cellular elements, especially platelets, can secrete a variety of components. Therefore, the blood cells should be removed immediately to provide optimal analyte stability. For investigators concerned about platelet contamination, options include filtration of the plasma through a 0.2-μm membrane filter double centrifugation of the specimen, and use of additives that minimize platelet activation, such as a mixture of citrate, theophylline, adenosine, and dipyridamole [12]. Overall, sample collection, preparation, and handling procedures should be consistent across the whole study. It is strongly recommended that one should carefully document the conditions of sample preparation and handling and diligently track all preanalytical variables. Certified reference materials can be developed for sample quality control and quality assurance [9].

The HUPO/Plasma Proteome Project (PPP) Specimens Committee concluded that plasma was preferable to serum, due to less degradation ex vivo [12–14]. Within four HUPO specimens (EDTA plasma, citrate plasma, heparin plasma, and serum) examined, platelet-depleted or citrate plasma was superior to serum for peptidome analysis. Protease inhibitors, especially aprotinin, were not recommended because it interferes with analysis of peptides [13]. However, there is still controversy on whether serum or plasma should be used because archived specimens are so frequently sera.

2.2 Depletion of highly abundant proteins

Plasma/serum proteome analysis has been hampered by the predominance of several highly abundant proteins including albumins, immunoglobulins (Igs), alpha-1-antitrypsin (A1AT), fibrinogen, and haptoglobin (HG) and their isoforms and fragments. Depletion of these highly abundant proteins is often desired prior to proteome analysis. The Cibacron blue dye method is a traditional way to deplete albumin. However, the binding of albumin to Cibacron blue dyes is nonspecific, and the sensitivity and specificity are not as effective as mAb-based immunoaffinity resin or columns [15–18]. Removal of IgG can be realized with Protein G resins or columns [15–20]. Comparing to Cibacron blue dye and Protein G methods, immunoaffinity depletion using multiple affinity removal columns (MARC) is more effective because it can simultaneously remove multiple abundant proteins, with minimal carryover, high longevity, and minimal nonspecific binding [16, 17, 21, 22]. Immunodepletion can be effectively realized with IgY antibody microbeads or peptide-based affinity medium [23, 24]. Protein precipitation with TCA/acetone or NaCl/ethanol also appears to be useful for depletion of albumins [25, 26].

There has been concern regarding whether less abundant serum proteins are removed along with albumin, HG, and other commonly depleted proteins. Those immunodepletion methods remove a good portion, but not all of the highly abundant proteins; however, more problematic is that these columns may also remove other proteins (e.g., cytokines) by “nonspecific” binding [27, 28]. A possible approach to address this problem is to disrupt the binding of low molecular weight (LMW) proteins to the carrier proteins albumin/IgG. For instance, partly denaturing conditions by adding 5 or 20% ACN may disrupt the binding between LMW and albumin/IgG, resulting in increased number of proteins detected under denaturing conditions when compared to native conditions. The presence of 5% ACN in serum provided better enrichment of LMW proteins compared to 20% ACN condition [26]. In addition, rather than depletion of highly abundant proteins, it would be promising to develop methods such as functionalized nanoparticles [29] to enrich the low abundant proteins/peptides or reduce the protein concentration range [30].

A variety of proteomics platforms have been applied to global identification of plasma/serum proteins. Table 1 lists some representative proteome analysis of plasma/serum and other body fluids. Interpretation of the listed data should be cautious considering different criteria and algorithms might be used for database searching and assigning protein IDs. A simple shotgun experiment based on 1-D or 2-D LC-MS/MS typically yields a list of a few hundred distinct proteins [31–35]. Multiplexed proteomics platforms, which often consist of prefractionation, multidimensional separation, and MS/MS techniques, are required in order to achieve a more comprehensive analysis.

Analysis and identification of proteins in HBFPs using MS-based proteomics

Prefractionation of plasma/serum proteins can be realized by using a number of separation techniques, including RP-LC, strong cation exchange (SCX) chromatography, SEC, anion-exchange chromatography (AEC), anion displacement liquid chromatofocusing chromatography, SDS-PAGE (slices of gel bands), membrane-based electrophoresis (Gradiflow BF400 fractionation), free flow electrophoresis (FFE), liquid-phase IEF, and PF2D (chromatofocusing/RP-LC) [36–43]. Multiplexing of these techniques, e.g., combining SEC with AEC or solution IEF with SDS-PAGE, can further enhance the resolving power [37–39]. In order to conclude which separation technique is more efficient for pre-fractionation of plasma/serum proteins, an HUPO/PPP study was carried out to compare SCX, slicing of SDS-PAGE gel bands, and liquid-phase IEF. Prefractionation based on SCX resulted in the largest number of identified proteins, followed by gel slices and then liquid-phase IEF. An important observation was that each of the methods revealed a set of unique proteins. Therefore, for comprehensive identification of plasma/serum proteins, several different pre-fractionation methods should be used in parallel [44]. Liquid-phase IEF (ampholyte-free) appeared to be very effective for prefractionation of peptides. By using liquid IEF prior to LC-MS/MS analysis, 844 unique peptides were identified, corresponding to 437 serum proteins [45]. A total of 1444 proteins were identified when the liquid IEF fractions were further fractionated by SCX and then analyzed by LC-MS/MS [46].

The HUPO/PPP project has led a number of studies about standardization of sample preparation and handling, evaluation of proteomics technologies, analysis of sub-proteomes (glycoproteome and peptidome), data analysis algorithms and management [47, 48], and database development [49, 12]. The proteomics platforms tested included LC-FT-ICR MS [50], antibody array [51], solution IEF/1-DE/RP-LC-MS/MS [38], 2-DE/MALDI-TOF-MS/MS, 2-D LC/RP-LC/micro-ESI-MS/MS, 2-D LC (SCX-RP-LC)/micro-ESI-MS/MS, 2-D LC (SCX-RP-LC)/nano-ESI-MS/MS [52], 1-DE/2-D LC/ESI-MS/MS [53], FFE/1-DE/2-D-nano-LC-MS/MS [42], and SELDI-TOF-MS [54] (Table 1). An important observation was that different platforms revealed a distinct subset of proteins and were more likely complementary toward identification of serum proteins [52]. A re-evaluation of three published plasma proteomic studies also indicated that there was minimum overlapping of proteins identified from different proteomics platforms. The proteins found through the literature search are strongly biased toward signal sequence-containing extracellular proteins, while the three proteomics methods showed a much higher representation of cellular proteins, including nuclear, cytoplasmic, and kinesin complex proteins [55].

As a result, the HUPO/PPP pilot phase study yielded 9504 international protein index (IPI) proteins identified with one or more peptides and 3020 proteins identified with two or more peptides [12]. Out of 3020 proteins identified, 345 proteins were found to have cardiovascular-related functions through manual literature searches [56]. Using a rigorous statistical approach for taking into account the length of coding regions in genes, and multiple hypothesis-testing techniques, a reduced set of 889 proteins was reported with a confidence level of at least 95% [57].

2.4 LMW proteome

Plasma/serum peptidome refers to the LMW fraction of plasma/serum proteome after high molecular weight proteins are removed. This LMW fraction is made up of several classes of physiologically important proteins such as cytokines, chemokines, peptide hormones, as well as proteolytic fragments of larger proteins. Peptidomics aims at the analysis of the peptidome and discovery of LMW biomarkers for human diseases [58].

A suitable specimen and sample handling procedure is crucial for peptidome analysis [11, 13, 59]. Plasma/serum peptidome is often prepared from the whole proteome using physical methods such as centrifugal ultrafiltration, gel chromatography, or precipitation, and during the sample preparation, some artificial occurring processes (e.g., cell lysis, proteolysis) should be minimized. Analysis and profiling of plasma/serum peptidome has been performed with 2-D LC-MS/MS, FT-ICR-MS, MALDI-TOF-MS, SELDI-TOF-MS, or LC-MS [13, 58–64]. Magnetic beads/particles are very useful for enrichment of serum peptides and can greatly enhance the reproducibility of MALDI-TOF-MS analysis of the peptides [59, 60].

Many LMW components in serum/plasma peptidome are bound to carrier proteins. Examination of the LMW species bound to a specific carrier protein may disclose important diagnostic information [65]. For instance, cancer-related protein BRCA2, a 390-kDa low-abundance nuclear protein linked to cancer susceptibility, was represented in sera as a series of specific fragments bound to albumin. This demonstrated that carrier-protein harvesting provides a rich source of candidate peptides and proteins with potential diverse tissue and cellular origins that may reflect important disease-related information [66].

2.5 Glycoproteins and lipoproteins

Many proteins in human plasma are glycosylated and change in the extent of the glycosylation and the carbohydrate structure of proteins have been linked to cancer and other disease states, highlighting the clinical importance of this modification as an indicator of pathological mechanisms. Lectin affinity chromatography (LAC) is often used for enrichment of either glycoproteins or glycopeptides prior to MS analysis [67–70]. The use of multilectin affinity column maximized the capturing of glycoproteins and therefore yielded a significant list of 150 glycoproteins in serum/plasma. A close correlation in glycoprotein profiles was observed between serum and plasma, except for the absence of fibrinogen found in plasma as a result of the clotting process. The glycoprotein profiles from three ethnic specimens, Caucasian American, Asian American, and African American, were found very similar except a higher angiotensinogen level and a lower histidine-rich glycoprotein level in Caucasian American samples, and a lower vitronectin level in African American blood samples [67]. The use of LAC for isolating both intact glycoproteins and glycopeptides (after proteolytic digestion) can further enhance the identification of glycoproteins. Using these affinity capturing steps and LC-MS/MS, a total of 86 N-glycosylation sites in 77 proteins in human serum were identified [70]. SEC was also found useful for significant enrichment of N-linked glycopeptides relative to nonglycosylated peptides because the N-linked glycans expressed on tryptic glycopeptides contribute substantially to their mass [71]. Recently, a new approach based on hydrazide chemistry for covalent capturing of glycoproteins was reported for broad analysis of human plasma N-glycoproteins [72–74]. Following the enzymatic digestion of the N-linked glycopeptides with N-glycosidase F, a 2-D LC-MS/MS analysis of the deglycosylated peptides resulted in the identification of a total of 2053 N-glycopeptides, representing 303 nonredundant N-glycoproteins [74]. This glycoprotein pull-down method can be incorporated with stable isotope labeling for quantitative analysis of glycoproteins in serum using MS/MS methods, such as LC-MS/MS or MALDI-TOF/TOF-MS [72, 73]. Serial LAC coupled with stable isotope labeling or global internal standard quantification strategy was reported for comparative analysis of sialylated glycoforms of proteins containing differentially branched complex-type glycans. The relative degree of sialylation among human serum glycoproteins or separated glycosylation sites within the same proteins were characterized and compared [75, 76].

Low-density lipoproteins (LDLs) in plasma were isolated by two-step discontinuous density-gradient ultracentrifugation and then identified using 2-DE, MALDI-TOF-MS, and LC-MS/MS. In addition to the dominating apo B-100, LDL was found to contain a number of other apolipoproteins such as apo C-II, apo C-III, apo E, apo A-I, apo A-IV, apo J, and apo M, serum amyloid A-IV (SAA-4), calgranulin A, lysozyme C, and their isoforms [77]. Analysis of the protein composition of LDL may contribute to revealing its role in atherogenesis and the mechanisms that lead to coronary disease in humans. High-density lipoprotein (HDL) is the most abundant lipoprotein particle in the plasma and a negative risk factor of atherosclerosis. Using a similar approach for LDL proteins, HDL was found to contain the following proteins: apo A-I, apo A-II, apo A-IV, apo C-I, apo C-II, apo C-III, apo E, apo M, SAA and SAA-4, A1AT, salivary alpha-amylase, and apo L. Similarly, many of these apoproteins exhibited different isoforms, and multiple glycosylated forms of apo A-I and apo A-II were observed [78].

2.6 Quantitative analysis and profiling

Quantitative proteome profiling is a key step to reveal differentially expressed proteins/peptides associated with disease states. Initially, MALDI-TOF-MS and SELDI-TOF-MS were considered as promising clinical tools because they can provide fast and high-throughout analysis compatible with the clinical setting. However, there have been concerns about the reproducibility and coverage of these techniques. MALDI-MS profiling of serum/plasma proteins or peptides usually requires SPE pretreatment or pre-enrichment with functionalized beads or particles [59, 60, 79–82]. Numerous studies have reported the use of SELDI-TOF-MS with different functional ProteinChips for disease detection mainly based on distinct proteomic patterns [83–87]. Questions have been raised regarding reproducibility, accuracy, mass range, and dynamic range of SELDI-MS, method and multilaboratory studies are being conducted to address these issues [88].

LC-MS profiling represents one of the label-free methods for relative quantification of proteins [89–95]. The method requires comparison of identical proteolytic peptides in each of the two experiments to accurately determine relative ratios of the proteins. This relies on the accuracy of mass measurement and chromatographic reproducibility. Although relative quantification monitors changes in protein abundance between two conditions, it does not determine the absolute quantity of these proteins. Therefore, informatics tools are required for alignment of the LC-MS data. A number of factors need to be considered such as denoising, mass and charge state estimation, chromatographic reproducibility, and peptide quantification via integration of extracted ion chromatograms. An open-source software suite, SpecArray, was recently introduced to generate and visualize peptide arrays (signals) from LC-MS datasets, providing a useful software platform for quantitative LC-MS profiling [92]. LC-MS was also tested for absolute quantification of proteins without requiring the use of external reference peptides. A single point calibration was obtained for the mass spectrometer that was applicable to the subsequent absolute quantification of all other proteins within a complex mixture [94]. Multidimensional LC-MS/MS was demonstrated for the identification of 32 proteins significantly increased in concentration following in vivo lipopolysaccharide (LPS) administration, including inflammatory and acute-phase proteins [95].

CE-ESI-MS is a promising tool for profiling of plasma/serum peptides [96–98]. The high resolving power of CE allows more than 1000 polypeptides, displayed within 45–60 min in a single analysis run. How to automatically deconvolute and normalize the spectra is important for extracting the quantitative information from the CE-MS spectra. Peptide profiling can be performed using a differential peptide display (DPD) method, which is an offline combination of RP-LC and MALDI-MS [13]. CE-MS and DPD are applicable to clinical peptidome analysis considering both methods are fast and reproducible.

DIGE requires only a single gel to reproducibly detect differences between two protein samples. It was realized by tagging the two samples with two different fluorescent dyes, running them on the same 2-D gel, postrun fluorescence imaging of the gel into two images, and superimposing the images [99]. 2-D DIGE was used to resolve approximately 850 protein spots in crude serum, whereas over 1500 protein spots were visualized following removal of six high-abundant proteins [100]. A new method termed as intact-protein analysis system (IPAS), coupled with Cy dye labeling, was reported for quantitative profiling of human plasma proteome. Following immunodepletion, paired plasma samples were labeled with different Cy dyes and subsequently separated in multiple dimensions according to their charge, hydrophobicity, and molecular mass. Differences in the abundance of resolved proteins are determined based on Cy dye ratios. IPAS provides a highly sensitive and quantitative approach for the analysis of serum and plasma proteins [14, 101].

Quantitative proteomics using stable isotope tagging and automated MS/MS is an emerging technique with potentials for clinical applications. MALDI-TOF/TOF-MS using isotope-labeled reference peptides has proven to be a high-throughput method for serum biomarker screening [102]. By using postdigestion trypsin-catalyzed 16O/18O peptide labeling, 2-D LC-FTICR MS, and the accurate mass and time tag strategy, a total of 429 plasma proteins were quantified following LPS administration, including 25 proteins significantly changed prior to and 9 h after LPS treatment [103]. In a new method termed as in-gel stable-isotope labeling (ISIL), proteins were initially resolved by gel electrophoresis (GE). The gel slices of interest were reacted separately with light versus heavy isotope-labeled reagents, mixed, and then digested with proteases. The resulting peptides were finally analyzed by LC-MS/MS for protein identification and quantification [104]. An advantage of ISIL is that visualization of gel differences can be used as a first quantification step followed by accurate and sensitive protein level stable-isotope labeling and MS-based relative quantification.

Immunoassays are very commonly used for validation of proteomic biomarkers. However, development of these assays is hampered by the need to have a specific antibody or antigen to the protein of interest. Furthermore, calibration curves of immunoassays are inherently nonlinear, and the crossvalidation is almost unavoidable. Quantitative MS analysis using multiple reaction monitoring (MRM) has been demonstrated for accurate analysis of a target protein and may serve as a powerful tool for the validation purpose [105–108]. Although MRM has been demonstrated for quantification of low abundant serum proteins such as prostate antigen (PSA) [106], there is still concern regarding the sensitivity of this assay compared to existing immunoassays. In order to quantify low abundant proteins in a serum, a method termed as stable isotope standards and capture by anti-peptide antibodies (SISCAPA) has been recently introduced by using antipeptide antibodies immobilized on nanoaffinity columns to enrich specific peptides along with spiked stable-isotope-labeled internal standards. The method appears suitable for quantification of low abundant proteins and should find application in the validation of diagnostic protein panels in large sample sets [107].

2.7 Protein arrays

Protein arrays are solid-phase ligand-binding assay systems using immobilized proteins on surfaces such as glass, cellulose membranes, mass spectrometer plates, microbeads, or micro/nanoparticles. The assays are highly parallel and often miniaturized. The main advantages of protein arrays include high-throughput, exquisite sensitivity, and minute sample required for analysis. However, the expression and purification of capture proteins, especially antibodies, is cumbersome. The design of capture arrays, particularly when screening against complex samples, also needs to take into consideration the problem of crossreactivity.

A very common format of protein array is the antibody array, in which antibodies are physically or covalently immobilized on surfaces to detect target molecules in complex samples such as serum. The experimental formats for antibody array can be broadly categorized into two classes: (i) direct labeling experiments and (ii) dual antibody sandwich assays [109, 110]. In the direct labeling method, the covalent labeling of all proteins in a complex mixture provides a means for detecting bound proteins after incubation on an antibody microarray. If proteins are labeled with a tag, such as biotin, the signal from bound proteins can be amplified. A two-color rolling-circle amplification (RCA) method on antibody microarrays has been developed for measuring the relative levels of proteins captured on the arrays from two serum samples, labeled with biotin and digoxigenin, respectively. Two-color RCA can produce fluorescence up to 30-fold higher than direct-labeling and indirect-detection methods on antibody microarrays, providing sensitive analysis of low abundant serum proteins [111]. In the sandwich assays such as cytokine and ELISA arrays, proteins captured on an antibody microarray are detected by a cocktail of detection antibodies, each antibody matched to one of the spotted antibodies. These sandwich-based arrays are extremely sensitive for low abundant cytokines or growth factors [112–115]. The correct use and interpretation of antibody microarray data requires proper normalization of the data. A recent study compared seven different normalization methods for antibody microarray analysis and suggested that normalization with ELISA-determined concentrations of IgM resulted in the most accurate, reproducible, and reliable data [116].

Autoantigen arrays appear to be very useful for analysis and characterization of serum autoantibodies in autoimmune diseases. Arrays containing the major autoantigens in eight distinct human autoimmune diseases were constructed, providing a powerful tool to study the specificity and pathogenesis of autoantibody responses, and to identify and define relevant autoantigens in autoimmune diseases [121–123]. A glomerular proteome array of ~30 antigens known to be expressed in the glomerular milieu was constructed for studying serum autoantibodies in lupus. Human lupus sera displayed two distinct IgM autoantibody clusters, one reactive to DNA and the other polyreactive. This antigen array is promising for uncovering novel autoantibody disease associations and distinguishing patients at high-risk [124]. Recently, a novel phage protein array using a phage-display library derived from prostate-cancer tissue has been developed for detection of serum autoantibody signatures in prostate cancer. To construct a phage-display library of prostate-cancer peptides, mRNAs were isolated from prostate-cancer tissues of six patients with clinically localized disease. After the insertion of the cDNA fragments into the T7 phage system, peptides that were encoded by the prostate cancer cDNA were expressed and displayed on the surface of the phage fused to the C-terminal of the capsid 10B protein of the phage. This surface complex functioned as a bait to capture autoantibodies in serum. A 22-phage-peptide prediction model yielded 88.2% specificity and 81.6% in discriminating between prostate cancer and the control [125].

Recombinant/purified allergen microarrays are very promising for profiling disease-eliciting allergens, e.g, quantitative measurement of serum allergen-specific IgE levels in Type I allergy. An allergen microarray containing 94 purified allergen molecules that represent the most common allergen sources was developed for monitoring of allergic patients’ IgE reactivity profiles to large numbers of disease-causing allergens using only minute amounts of serum [126]. This microarray test provides equivalent performance to ELISA tests and offers a significant advantage in convenience and cost when compared to traditional allergy test formats [126–132]. Protein arrays using Hepatitis B or C virus (HBV and HCV) antigens were also described for detection of HBV or HCV antibodies in patients’ sera [133, 134]. Such array-based assays offer the opportunity of serodiagnosis of infectious diseases by allowing simultaneous measurements of specific subclasses of antibodies directed against pathogenic antigens.

Other array formats used for serum protein analysis include selfassembled monolayer array, filtration-based array, multiplexed photoaptamer-based arrays, phage-based array, and RP array [135–139]. The filtration-based protein array can improve the overall reaction kinetic rate by ten-fold and enhance the sensitivity and specificity of the assay. In RP array, serum samples are spotted on a microarray and screened for their content of a specific serum protein in a single experiment using target-recognizing antibodies and fluorescently labeled secondary antibodies [137]. The photoaptamer-based arrays utilize photoaptamers covalently bind to their target analytes before fluorescent signal detection. The arrays can be vigorously washed to remove background proteins, providing superior sensitivity (LOD<10 fM) for low abundant proteins such as interleukin-16, vEGF, and endostatin [138].

A bottleneck in fabricating protein arrays, especially those for global measurements, is the production of the huge diversity of purified proteins, e.g., antibodies. A protein array containing 2413 nonredundant purified human fusion proteins on a polymer surface was developed for profiling the antibody repertoire in serum from autoimmune disease patients [140]. This study points to a new source of producing thousands of purified proteins for protein array fabrication. Peptide microarrays displaying biologically active small peptides in a high-density format provide an attractive technique to probe complex samples such as serum [141–143]. A technical difficulty in fabrication of peptide arrays is that peptides, usually with small molecular mass, are not easily accessible when adsorbed onto solid supports. Peptides also lack a well-defined 3-D structure, and therefore a correct orientation is essential to promote the interaction between peptides and their targets. Incorporation of elongated spacer molecules or modification of the solid substrate can significantly improve the accessibility of a target molecule [142]. In addition, protein arrays can also be prepared with the cellular proteins expressed in a cell line or the proteins expressed in a cell-free in vitro transcription/translation system [144, 145].

2.8 Application to human disease detection

One of the major applications of serum proteome analysis is to discover biomarkers for human cancer detection. Although tumor markers are routinely measured in clinical oncology, their value in cancer detection has been controversial, largely because no single tumor marker is sensitive and specific enough to meet stringent diagnostic criteria. One strategy to overcome the shortcomings of single tumor biomarkers is to measure a combination of proteomic bio-markers to enhance the sensitivity and specificity.

Analysis of serum proteomes from a variety of cancer patients has led to the discovery of a number of putative markers (Table 2). Future validation of these candidates on large and independent patient cohorts needs to be accomplished using quantitative assays such as ELISA or LC-MRM MS. Most of the studies listed in Table 2 were performed using 2-DE/MS or SELDI-MS. With the new advances in quantitative proteomics, we may expect more applications of isotope tagging/MS/MS, label-free LC-MS, or CE-MS methods in serum biomarker discovery. In addition, aberrant glycosylation of proteins is known to be closely associated with cancer progression. Considering many clinical bio-markers and therapeutic targets are glycoproteins (e.g., Her2/Neu, CEA, PSA, CA19-9, CA15-3, CA125, etc.), quantitative serum glycoproteome analysis using affinity capturing agents, and isotope labeling could be a promising approach in searching for cancer biomarkers.

Some candidate markers discovered in serum are acute phase proteins and commonly shared among different cancers. These proteins are relatively abundant and also over-expressed in some other human diseases, e.g., autoimmune diseases. This implies that the current serum proteome analysis is far from being comprehensive. With new proteomics technologies to enhance resolving power and unmask low abundant proteins, we may foresee more sensitive and specific serum protein markers for human cancers.

Analysis of blood samples from patients suffering from autoimmune diseases remains a mainstay in the clinic for initial diagnosis, prognostication, and clinical decision-making. Serum proteome analysis has revealed peptides and proteins (e.g, CRP, calgranulin A, B, and C) with diagnostic value for rheumatoid arthritis (RA) and multiple sclerosis [234–237]. Meanwhile, testing for the presence of serum autoantibodies has proven to be a confirmatory assay for autoimmune diseases [121, 238–242]. For instance, the serum autoantibodies against alpha-enolase were found in early RA patients, while those against actin, alpha-enolase, and ATP synthase were found in celiac disease patients. The serum autoantibody to triosephosphate isomerase (TPI) was predominantly detected in osteoarthritis (OA), compared to RA and systemic lupus erythematosus (SLE).

The most altered proteins in HBV infected sera were found to be HG beta and alpha-2 chain, apo A-I and A-IV, A1AT, transthyretin (TTR), and DNA topoisomerase II beta. The alteration of these proteins was displayed not only in quantity but also in patterns [243]. SELDI-MS analysis of sera from chronic hepatitis B, liver fibrosis, liver cirrhosis, and nonalcoholic fatty liver disease patients [244–246] suggested that distinct proteomic patterns were capable of differentiating among different stages of fibrosis and predicting fibrosis and cirrhosis. Deglycosylation of serum proteins may result in a simplification of serum proteome profiles and enhanced resolution for comparative proteome analysis. Serum amyloid P component and ceruloplasmin, apparent only after de-N-glycosylation, were shown to correlate with HBV disease [247]. Golgi Protein 73 (GP73), a glycoprotein, was found elevated and hyperfucosylated in sera of patients with hepatocellular carcinoma, which may assist in the early detection of HBV-induced liver cancer [155].

Heart disease is the leading cause of mortality and morbidity in human. Biomarkers are needed for the diagnosis, prognosis, therapeutic monitoring, and risk stratification of acute injury (acute myocardial infarction (AMI)) and chronic heart disease [248]. Serum apo A-I, apo A-II, and their glycosylated products were found differentially expressed between atherosclerosis patients and normal controls by using SELDI-MS [249]. MALDI-MS analysis of peptides from the sera of normal and AMI subjects produced characteristic patterns that could provide an accurate diagnosis of AMI [250]. Serum alpha-B-crystallin and tropomyosin were discovered and validated as valuable biomarkers for cardiac allograft rejection [251].

Diagnosis of stroke mainly relies on neurological assessment of the patient using neuro-imaging techniques including computed tomography and/or magnetic resonance imaging scan. An early diagnostic marker of stroke, ideally capable to discriminate ischemic from hemorrhagic stroke would considerably improve patient acute management. Serum heart-fatty acid-binding protein was discovered as a valid marker for the early diagnosis of stroke, with considerably better diagnostic value than NSE and S100B [252]. A SELDI-MS analysis with ELISA validation indicated that apo C-I and apo C-III levels in plasma could distinguish between hemorrhagic and ischemic stroke [253].

Two adjacent studies indicated that SELDI-MS might be a useful technique for detection of emerging infectious diseases such as Severe Acute Respiratory Syndrome (SARS). In the first study, nine proteins were found significantly increased and three proteins were found significantly decreased in SARS patients, including an 11 695-Da protein further identified as SAA [254]. In the second study, a discriminatory classifier with a panel of four biomarkers was reported to have 97.3% sensitivity and 99.4% specificity for SARS detection. This classifier could also distinguish acute SARS from fever and influenza with 100% specificity [255].

The CNS is shielded from systemic influences by two separate barriers, the blood–brain barrier (BBB) and the blood-to-cerebrospinal barrier (BCB). Failure of either barrier bears profound significance in the etiology and diagnosis of several neurological diseases. Comparative proteome analysis of sera from patients undergoing BBB disruption (BBBD) with pre-BBB opening serum indicated that alpha-2-macroglobulin (A2MG) and TTR were correlated with BBBD and may be a peripheral tracer of BCB [256, 257]. In Alzheimer’s disease (AD) patients, serum proteins such as HG, hemoglobin, vitronectin, alpha-1-acid glycoprotein (A1AG), apo B100, fragment of factor H, and histidine-rich glycoprotein were found at altered levels [258]. Aldolase was suggested as a common autoantigen in AD, and might serve as a new target for potential immune modulation [259]. Discriminant pattern analysis of carrier-protein-bound serum peptides using MALDI orthogonal TOF-MS successfully classified AD patient and control subjects with high sensitivity and specificity [260].

Serum proteins, including HG, hemoglobin, fibrinogen, A2MG, TTR, pro-platelet basic protein, and complement C3, C4, and C1 inhibitors, were found at altered levels in the patients suffering from insulin resistance/type-2 diabetes [258]. The small, dense LDLs of diabetes patients were found enriched in apo C-III and depleted of apo C-I, apo A-I, and apo E compared with matched healthy controls [261]. Serum proteomic pattern analysis could be used for evaluation of periodic hemodialysis treatment [262] and for assessment of the remission from active Wegener’s granulomatosis [263]. Based on a 2-DE/Western blotting approach, ~85 immunoreactive protein species were detected with systemic candidiasis patients’ serum specimens. Such approach may be useful for diagnosis and clinical follow-up of the fungal infections [264].

3 Urine

3.1 Analysis and characterization of urine proteome

Urine with less complexity than serum and relatively high thermodynamic stability is a promising study medium for discovery of novel biomarkers in many human diseases such as bladder cancer and renal diseases. Optimization of sample preparation is a necessary first step for urinary proteome analysis and sample desalting using SPE, precipitation or dialysis is often required [265–269]. Acetone appears to precipitate more acidic and hydrophilic proteins, whereas ultracentrifugation fractionates more basic, hydrophobic, and membrane proteins [266]. The efficiency for precipitation of urinary proteins varies among different organic solvents. The highest recovery rate for urinary proteins was obtained from 90% ethanol and the lowest one was from 10% acetic acid. The ACN-precipitated urine sample produced the greatest number of spots on a 2-D gel, whereas the acetic-precipitated sample yielded the smallest number of spots [268].

A standard shotgun analysis, by using a combination of 1-DE/1-D LC/MS/MS, direct 1-D LC/MS/MS, and 2-D LC/MS/MS, has resulted in the identification of total 226 proteins in the urinary proteome [270]. Similar to serum proteome analysis, prefractionation and immunodepletion help unmask low abundant proteins in urine. A simple prefractionation based on molecular weight cutoff followed by immunodepletion of albumin and IgG has yielded 1400 distinct gel spots and 150 unique proteins identified, including ~50 plasma proteins [271]. Microbeads coated with hexameric peptide ligand libraries can greatly enhance identification of urine proteins by drastically reducing the level of the most abundant species, while strongly concentrating the more dilute and rare ones [272].

A recent study has revealed that both membrane proteins and cytosolic proteins from renal epithelia are highly enriched in low-density urinary structures identified as exosomes. Exosomes are membrane vesicles that originate as internal vesicles of multivesicular bodies. A nano-LC-MS/MS analysis has been performed to identify 295 proteins in urinary exosomes, including multiple proteins known to be responsible for renal and systemic diseases. Therefore, urinary exosomes may be a rich source for disease biomarker discovery in urine [273, 274].

Human urine proteome contains a large array of peptides, which can be profiled with CE-MS, LC-MS, or DPD (LC/MALDI-TOF-MS) [275–278]. SELDI-TOF-MS offers a simple platform for high-throughput urine profiling; however, standardization of analysis conditions is critical, and both extrinsic and intrinsic factors must be taken into account for accurate data interpretation [279]. Biomolecular interaction analysis MS (BIA/MS) is a 2-D chip-based analytical technique based on quantitative measurement of the interactions between surface-immobilized ligands and solute analytes through surface plasmon resonance (SPR) sensing as well as MALDI-TOF-MS analysis of the analytes affinity-captured on the sensor surface. This technique may be useful in rapid screening of urinary proteins indicated as putative biomarkers for renal dysfunction [280, 281].

3.2 Application to human disease detection

Disease diagnostics using urine proteomic biomarkers is attractive because urine testing is simple and noninvasive [282]. Urine proteome analysis toward the discovery of bladder cancer biomarkers has been clearly demonstrated by using 2-DE/MS or SELID-TOF-MS approaches [283–291]. Some putative markers discovered for bladder cancer included urinary psoriasin, calreticulin (CRT), gamma-synuclein (SNCG), catechol-o-methyltransferase (COMT), orosomucoid, zinc-alpha-2-glycoprotein, matrix metalloproteinase-2 (MMP-2), MMP-9, fibronectin, and their fragments [284–288]. As a combination, CRT, SNCG, and COMT exhibited an overall sensitivity of 76.8% and specificity of 77.4%. A point-of-care proteomic assay was reported for measurement of the nuclear matrix protein NMP22 in urine as, it could enhance detection of malignancy in patients with risk factors or symptoms of bladder cancer. The sensitivity and specificity of the NMP22 assay were determined to be 55.7 and 85.7%, respectively [292, 293].

Urine proteome analysis may potentially unravel markers for other cancers including prostate, renal, lung, and ovarian origins [278, 294–298]. Urine markers in prostate cancer may overcome the limitations of serum PSA by improving the overall specificity in diagnosis. There is no routinely used circulating marker for renal cancer, which is often detected incidentally, and frequently advanced at the time of presentation with over half of the patients having local or distant tumor spread. SELDI-TOF-MS profiling with neural-network analysis was able to detect the renal cancer with sensitivities and specificities of 81.8–83.3%. However, when the assay was tested on a new independent patient cohort 10 months later, sensitivities and specificities declined remarkably, ranging from 41.0 to 76.6%. Eosinophil-derived neurotoxin (EDN), especially a glycosylated form of EDN, and osteopontin fragments were found elevated in the urine of ovarian cancer patients. A combination of both modified EDN and osteopontin fragments showed a specificity of 93% and sensitivity of 72% for early stage ovarian cancer.

There has been an increasing interest in developing urine biomarkers for detection of renal allograft rejection as an alternative to percutaneous needle biopsy, which is costly and associated with significant patient morbidity and mortality [299]. SELDI-MS profiling indicated that there were predominantly small urine peptides or proteins (2.0–25.7 kDa) that could distinguish between renal transplant patients with no rejection and those with acute rejection [300–304]. Patients with acute tubulointerstitial rejection also expressed higher levels of cleaved beta-2-microglobulin (B2MG) in urine [305].

Distinct urinary proteomic patterns, as obtained by CE-MS, SELDI-MS, or label-free LC-MS profiling, have been demonstrated for early detection of renal diseases/complications [306–309]. In a CE-MS study, 16 differentially excreted polypeptides formed a pattern as early indicators of graft-versus-host disease (GVHD), allowing discrimination of GVHD from patients without complications with 82% specificity and 100% sensitivity (crossvalidation) [307]. Similar CE-MS method was also applied to the assessment of renal damage in patients with Type I or II diabetes as well as in patients with IgA nephropathy [310–313]. In a large-scale proteomic study, both 2-DE/MS and quantitative proteomics (isotope-coded affinity tags (ICAT)/2-D LC-MS/MS) approaches were used to compare the urinary proteomes between patients with Dent’s disease and normal subjects. Several vitamin and prosthetic group carrier proteins, apolipoproteins, and complement components were found at higher levels in Dent’s urine compared with normal urine, suggesting that such proteins are reabsorbed more efficiently than other classes of proteins. Conversely, proteins of renal origin were found in proportionately higher amounts in normal urine [308]. The urinary proteomic changes associated with interstitial and bacterial cystitis [314], urolithiasis [315], and pancreatitis [316] were also investigated. Differential expression of pancreatic secretory trypsin inhibitor and its proteolytic fragment was observed between patients with pancreatitis and normal control subjects, suggesting a possible role of the protein as a cause of hereditary pancreatitis.

4 Cerebrospinal fluid (CSF)

4.1 Analysis and characterization of CSF proteome

CSF is secreted from several different CNS structures, and any changes in the protein composition of CSF may be indicative of altered brain protein expression in neurodegenerative or other CNS disorders [317, 318]. Analysis of brain-specific proteins in CSF is complicated by the fact that most of the CSF proteins are derived from the plasma, and highly abundant proteins tend to obscure less abundant proteins. In addition, the total protein concentration in CSF is very low at 0.3–0.7 mg/mL. Depletion of highly abundant proteins and sample pre-enrichment and desalting (e.g., precipitation or ultrafiltration) are often required prior to CSF proteome analysis [319–324]. Two immunodepletion methods, IgY antibody microbeads and MARC, were recently evaluated. IgY antibody microbeads removed six CSF proteins: HSA, transferrin, IgG, IgA, IgM, and fibrinogen, whereas MARC removed HSA, transferrin, IgG, IgA, A1AT, and HG. 2-DE comparison indicated that MARC removed the major proteins more effectively, and approximately 50% more spots were visualized when compared to the 2-D gel of CSF without protein depletion [325].

Identification of the CSF proteins was performed using 2-DE/MS or gel-free approaches such as CE-FT-ICR MS, LC-FT-ICR MS, and 2-D LC-MS/MS [326–330]. A total of 249 proteins were identified from the CSFs of ten subjects by using immunodepletion, C18-SPE pretreatment of peptides, and 2-D LC-MS/MS. Of these proteins, 38% were unique to individual subjects, whereas only 6% were common among all ten subjects, suggesting considerable subject-to-subject variability in the CSF proteomes [330]. Quantitative proteomics using ICAT was applied to the comparative analysis of CSF proteins from younger adults with those of older adults. More than 300 CSF proteins were identified, and 30 proteins exhibited >20% change in concentrations between older and younger individuals. These data supplied the necessary information to appropriately interpret protein biomarkers of age-related neurodegenerative diseases [331]. Using ultra-filtration and SPE, LMW CSF fraction (Mw < 5 kDa) was prepared and then analyzed with capillary LC-Quadrupole TOF (QTOF) MS, leading to the identification of 20 peptides derived from 12 proteins [332].

Characterization of PTMs of CSF proteins contributes to understanding the molecular events that occur in the homeostatic and pathological processes of the CNS. Four CSF proteins, kallikrein-6, complement C4 gamma-chain, gelsolin, and ceruloplasmin, were identified as phosphotyrosyl proteins using a combination of GE, Western blotting, and immunoprecipitation [333]. Following the immunodepletion of most abundant CSF proteins (HSA, transferrin, IgG, IgA, IgM, fibrinogen, A1AT, and HG), 2-DE with specific fluorescent staining was used to map out the phosphoproteins and glycoproteins. Subsequent in-gel digestion and LC-MS/MS analysis resulted in the identification of 10 phosphoproteins and 14 glycoproteins in CSF [325].

4.2 Application to neurological disease detection

Identification of CSF biomarkers for neurological disorders would be of great value to clinicians because of the difficulties in early and differential diagnoses of these diseases in clinical practice [334]. Early diagnosis of degenerative diseases is important in initiating symptomatic treatment, and will be of even greater significance if drugs with a potential to slow down the degenerative process prove to have a clinical effect with the use of specific CSF markers. Discrimination of AD from controls and from other neurological diseases has been improved by simultaneous analysis of beta-amyloid (1–42), total-tau, and phosphorylated tau, where a combination of low levels of CSF-beta-amyloid 1–42 and high levels of CSF-tau and CSF-phospho-tau is associated with an AD diagnosis [335–337]. This may be further strengthened by the recent discovery of a number of putative CSF markers including apo A1, apo E, apo J, retinol-binding protein (RBP), kininogen, A1AT, cell cycle progression eight protein, alpha-2-HS glycoprotein (A2HSG), alpha-1 beta-glycoprotein, cathepsin B, amyloid precursor, complement proteins, glutathione independent prostaglandin D synthase [338–343]. Developing these proteins as new biomarkers may lead to the early diagnosis of AD and provide useful information in drug trials. Another observation from these proteomic studies is that when investigating a CSF protein as a possible biomarker, it may be useful to compare individual protein isoform expression levels in addition to the more commonly measured total protein expression level [343].

Frontotemporal dementia (FTD) refers to a group of rare brain disorders that affect the frontal and temporal lobes of the brain, which control speech and personality. A 2-DE/MS study revealed that six CSF proteins, granin-like neuroendocrine precursor, pigment-epithelium-derived factor, RBP, apoE, HG, and albumin, were significantly altered in FTD compared to controls. Several proteins involved in FTD pathology were not influenced in the CSF of AD patients, and vice versa, establishing differences in the pathophysiological mechanisms between FTD and AD. Since FTD is commonly misdiagnosed as AD, these differentially expressed proteins may be used as indicators to distinguish FTD and AD, two of the most common neurodegenerative disorders [344]. MS is an autoimmune inflammatory demyelinating disease of the CNS. Disease mechanisms of MS at the molecular level remain poorly understood and the diagnosis of MS is challenging because of the lack of a specific diagnostic test [345]. CSF proteins, such as cartilage acidic protein, tetranectin, SPARC-like protein, autotaxin t, and a cleavage product of cystatin C, could be potential markers for MS diagnosis because they were predominantly identified in MS patients but not in the patients with non-MS inflammatory CNS disorders [346, 347].

Creutzfeldt–Jakob disease (CJD) is a rare, degenerative, invariably fatal brain disorder belonging to a family of human and animal diseases known as the transmissible spon-giform encephalopathies (TSEs) or prion disease. The diagnosis of CJD is based on clinical and electroencephalographic criteria, and there is no practical and reliable premortem test for CJD and related TSEs. Early studies by immunoassay and 2-DE suggested that the 14-3-3 protein levels in CJD patients’ CSF correlated with NSE levels, and therefore might serve as a marker for CJD disease [348, 349]. A follow-up study by 2-DE/Western blotting revealed multiple isoforms of 14-3-3 protein (14-3-3 beta, gamma, epsilon, and eta) were all present in the CSF of patients with CJD, and specific 14-3-3 isoform patterns could be used to differentiate CJD from other neurodegenerative diseases [350]. A panel of seven CSF proteins, including apo E, were found capable of distinguishing ante mortem variant CJD from ante mortem sporadic CJD [351]. The cellular prion protein (PrPc) is a glycosylphosphatidylinositol-anchored glycoprotein, which represents the substrate for the generation of a conformational pathogenic isoform (PrPsc) in prion diseases. Incredible microheterogeneity of PrPc in human CSF (60 2-D Western spots) was observed, presumably resulting from the PrPc glycosylated isoforms as well as N-terminally truncated fragments [352, 353].

Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disorder characterized by degeneration of motor neurons. Two CSF proteins, TTR and cystatin C, were found down-regulated, whereas the carboxy-terminal fragment of neuroendocrine protein 7B2 was found up-regulated in the CSF of ALS patents compared to controls [354]. Two isoforms of A2HSG were expressed at higher levels in patients with low-grade gliomas compared with the control group consisting of patients with mixed neurological diagnoses. In one patient, the level of AHSG was significantly reduced after gross total resection of the tumor [355]. Two tumor-related proteins, N-myc oncoprotein and low-molecular weight caldesmon, were identified in CSF samples of patients with primary brain tumors [356]. The CSFs of patients suffering from HIV, Chronic Fatigue Syndrome (CFS), preeclampsia, and cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy were also investigated [357–360]. Assessment of CSF fluids from CFS, Persian Gulf War Illness (PGI), and fibromyalgia patients by LC-QTOF-MS suggested that the CSF proteomes and presumably pathological mechanisms might be shared among these disorders although syndrome names and definitions are different.

5 Saliva

5.1 Analysis and characterization of human saliva proteome

Human saliva is secreted from multiple salivary glands including parotid, submandibular (SM), sublingual (SL), and other minor glands lying beneath the oral mucosa. Saliva contains a large array of proteins, many of which can be very informative for human disease detection. Our group is one of the three groups recently funded by the National Institute of Dental and Craniofacial Research to conduct human salivary proteome project (www.hspp.u-cla.edu) [361, 362]. It is envisioned that with the catalog of human salivary proteome available, one can begin to examine and compare salivary proteomes of high impact diseases such as cancers, diabetes, and autoimmune diseases.

Numerous studies have been performed to identify saliva proteins using 2-DE/MS [362–370]. A significant number of spots on a typical saliva 2-D gel actually correspond to amylases, cystatins, and Igs [369, 370]. Therefore, depletion of amylase and Igs prior to 2-DE analysis should improve mapping and identification of saliva proteins. Compare with 2-DE/MS, shotgun proteome analysis based on advanced MS/MS technologies, such as QTOF, linear IT (LIT), and LIT-Orbitrap, provides significantly enhanced power for identification of saliva proteins [361, 362, 371–374]. Both FFE and CIEF were successfully coupled with LIT-MS analysis, leading to the identification of 437 and 1381 proteins, respectively, in whole saliva [373, 374]. In addition, a specific subgroup of saliva proteins such as proline-rich proteins (PRPs), histatins, and cystatins, were independently identified and characterized [375–378], and relative quantitation of saliva LMW peptides was also demonstrated using isotope tagging for relative and absolute quantitation (iTRAQ) and MALDI-TOF/TOF [379].

Similar to plasma/serum, many proteins (e.g., mucin MUC5B and MUC7) in human saliva are glycosylated. Lectin-binding assay on PAGE gels appears to be useful for characterization of saliva glycoproteins associated with oral diseases such as Sjogren’s syndrome (SS) [380]. GE/MS has been well demonstrated to characterize salivary glycoproteins glycoforms and their glycan structures (e.g., O-glycans released by reductive beta-elimination) [381–383]. We have recently profiled saliva glycoproteins using a glycoprotein pull-down strategy based on hydrazide chemistry [72]. In total, we have identified 76 distinct glycopeptides representing 46 distinct glycoproteins in human whole saliva [384].

5.2 Application to human disease detection

Saliva is an attractive medium for disease diagnosis because saliva testing has several key advantages including minimum cost, noninvasiveness, and easy sample collection and processing. Saliva proteins may have diagnostic value for human oral and systemic diseases. For instance, the soluble fragments of the c-erbB-2 oncogene and CA 15-3 were found significantly up-regulated in the saliva samples of breast cancer patients than in those of healthy controls and patients with benign tumors [385, 386]. Elevated levels of salivary CEA, defensin-1, TNF-alpha, IL-1, 6 and 8, and CD44 were detected in patients of oral cancer [387–393]. Most of these studies were based on immunoassays of individual gene products. Proteomic biomarkers, when combined, are expected to enhance the sensitivity and specificity of human cancer detection.

Salivary protein markers have been well demonstrated for diagnosis of SS, which is an autoimmune disease characterized by xerostomia (dry mouth) and xerophthalmia (dry eyes). Saliva autoantibodies, e.g., anti-Ro/SS-A, anti-La/SS-B, and anti-alpha-fodrin, have been applied to SS detection in clinical setting [394, 395]. Recently, parotid salivary markers were discovered for SS using SELDI-TOF-MS and 2-D-DIGE. Compared to non-SS subjects, parotid B2MG, lactoferrin, IgG kappa light chain, polymeric Ig receptor, lysozyme C, and cystatin C were found significantly increased whereas two proline-rich proteins, amylase, and carbonic anhydrase VI were found significantly decreased in the patient group [396]. These candidates remain to be validated, and ideal salivary markers would be capable of differentiating SS from other autoimmune diseases.

6 Bronchoalveolar lavage fluid (BALF)

6.1 Analysis and characterization of BALF proteome

BALF, obtained during fiber-optic bronchoscopy, is a bio-fluid mirroring the expression of normally secreted pulmonary proteins and the products of activated cells and destructive processes. BALF is presently the most common way of sampling the components of the epithelial lining fluid and the most faithful reflect of the protein composition of the pulmonary airways. The characterization of the proteome within this compartment provides an opportunity to establish temporal and prognostic indicators of airway diseases [397, 398].

2-DE has been commonly used for BALF proteome analysis, presumably because many BALF proteins are structurally modified and/or truncated due to proteolysis. There have been continuous efforts on the development of reference 2-D maps for human BALF proteins [399–402]. The current master gel of BALF proteins comprises more than 1200 spots visualized by silver staining [402]. Identification of proteins on 2-D gels was initially performed using immunoblotting, Edman sequencing, or matching to a reference gel [403, 404], and this was significantly improved through the use of advanced MS identification techniques [405–409]. A large-scale proteomic analysis by using immunoaffinity depletion, SDS-PAGE fractionation, protein in-gel digestion, and subsequent nano-LC-MS/MS was reported for identification and semiquantitation of more than 1500 distinct proteins in BALF. Around 10% proteins displayed significant up-regulation specific to the asthmatic patients after segmental allergen challenge. The differentially expressed proteins represent a wide spectrum of functional classes and the majority of the expression changes are closely associated with many aspects of the pathophysiology of asthma. This large portion of newly identified proteins in BALF has provided new insights for finding novel pathological mediators and biomarkers of asthma [408].

6.2 Application to human diseases detection

Analysis of BALF proteome and discovery of disease-associated proteins will contribute both to a better knowledge of the lung structure at the molecular level and to the study of lung disorders at the clinical level. Although BALF proteins reflect great diversity of cellular origins and functions, a comparative analysis of serum and BALF proteomes revealed that some proteins were more abundant in BALF than in plasma, suggesting that they are specifically produced in the airways. These proteins are, therefore, good candidates for becoming lung-specific disease biomarkers [397, 410].

A majority of BALF proteomic studies have been dedicated to fibrosing interstitial lung diseases. Patients of sarcoidosis, an inflammatory lung disease, exhibited an altered BALF proteome profile and many proteins at altered levels were identified as nonplasma proteins involved in the inflammatory and oxidant-antioxidant processes [411]. Differential BALF proteomic profiles were observed between patients suffering from sarcoidosis and idiopathic pulmonary fibrosis (IPF), two forms of interstitial lung disease characterized by different pathogenesis and clinical evolution. Sarcoidosis patients exhibited higher amount of plasma proteins, while IPF patients showed significant higher levels of LMW proteins, either involved in inflammatory processes (e.g., calgranulin) or antioxidant response (e.g., antioxidant peroxysomal enzyme and thioredoxin peroxidase) [412, 413]. A comparative analysis of BALFs from patients with sarcoidosis, IPF, and systemic sclerosis revealed that carbonylated proteins were up-regulated in BALFs of all disease patients and a greater number of protein targets of oxidation were present in BALF of IPF patients. This predicts that oxygen-derived free radicals produced by phagocytes may contribute to lung tissue damage occurring during diffuse lung diseases [414].

Monitoring the proteolytic changes and structurally modified pattern of BALF proteins in lung diseases may provide further insight of the disease mechanism or diagnostic potential [397, 406, 415, 416]. Most lung disorders are known to be associated to considerable modifications of surfactant composition. Proteolytic derivatives of surfactant protein A (SP-A), an important innate host defense component of the lungs, was found present in BALF of cystic fibrosis (CF) patients but not present in controls [415]. Using a high-resolution FT-ICR MS analysis, surfactant proteins SP-A and SP-D, including structurally modified and truncated forms, were identified in BALF from patients with CF, chronic bronchitis, and pulmonary alveolar proteinosis [406].

The changes in BALF proteome profile may reflect the associated state of asthma. The altered levels of lipocalin-1, cystatin S, IgBF, and TTR were observed in individuals who suffered from upper airway irritation or asthma [404]. Gel-solin secretion was found increased three-fold in the airway surface liquid of epithelia treated with IL-4, suggesting that gelsolin might improve the fluidity of airway surface liquid in asthma by breaking down filamentous actin that may be released in large amounts by dying cells during inflammation [417, 404]. In addition, a number of 2-DE studies were conducted to monitor the proteomic changes in BALF of patients suffering from lupus erythematosis, Wegener’s granulomatosis, lipoid pneumonia, chronic eosinophilic pneumonia, asbestosis, bacterial pneumonia, hypersensitivity pneumonitis, malignancies, immunosuppression, and smoking [399–402, 407, 418–421].

7 Synovial fluid (SF)

SF is a thin, stringy fluid present in the cavities of synovial joints. It can reduce friction between the articular cartilage and other tissues in joints, to lubricate and cushion them during movement. SF may be classified into normal, noninflammatory, inflammatory, septic, and hemorrhagic, with each group associated with certain diagnoses. Regular SF analysis is commonly performed to determine the cause of acute arthritis.

SF contains a large number of proteins originating from synovial tissue, cartilage, and serum. The protein composition in SF may reflect the pathophysiological conditions affecting the synovial tissue and articular cartilage. Therefore, analysis of SF proteome may introduce specific protein markers for early and differential diagnosis of joint diseases such as RA and OA. 2-DE has been commonly used for differential mapping of SF proteomes from patients with different forms of arthritis [422–425]. Distinct changes in protein patterns were observed between RA and OA patients, and altered levels of acute-phase proteins in SF were observed for the RA patients treated with antibody to CD4 [426]. Unlike other human body fluids, removal of abundant albumin and γ-globulins from SF samples did not significantly improve 2-DE separation of SF proteins [423]. LC-MS screening of peptides in human SF indicated the presence of a peptide AGLPEKY (SAA (98–104)) derived from SAA. The peptide and several of its analogs were found capable of binding isolated human CD(4)(+) T-lymphocytes and stimulating them to produce interferon-gamma. Given the high acute-phase serum level of SAA and its massive proteolysis by inflammatory related enzymes, SAA-derived peptides may be involved in host defense mechanisms [427].

An important application of SF proteome analysis is to identify effective markers that can differentiate two joint diseases, RA and OA. Putative SF markers discovered for RA include calgranulin B, calgranulin C, SAA, and myeloid-related protein 8 [428–430]. To confirm the finding about calgranulin B and to examine its applicability as a diagnostic marker, levels of the calgranulin A/B heterocomplex in plasma and in SF were further validated by immunoassays on patients of RA, OA, and other inflammatory joint diseases. It was found that plasma levels of the calgranulin A/B heterocomplex correlated well with levels in SF, and hence, determination of plasma levels could be used to distinguish RA patients from controls and patients with other inflammatory joint diseases. The plasma level of calgranulin A/B heterocomplex might also be useful for monitoring anti-TNFalpha therapy [429]. SF proteins, trappin-2, and anti-TPI autoantibody could be potential markers for OA diagnosis as they were detected predominantly in the OA patients [431, 432].

A two-step proteomic approach was used to identify and validate prognostic biomarkers that can predict whether the RA patients will develop an erosive, disabling disease. In the first step, 2-D LC-MS/MS was used to generate protein profiles of SF from patients with either erosive RA or nonerosive RA. In the second step, the selected candidates were verified using quantitative LC-MRM MS in sera of patients with erosive RA or nonerosive RA and of healthy controls. The 2-D LC-MS/MS profiling yielded a total of 418 distinct proteins, including CRP and six members of the S100 protein family that were elevated in the SF of patients with erosive RA. Quantitative LC-MRM analysis verified that the levels of CRP, calgranulin A, B, and C were significantly elevated in the serum of patients with erosive RA compared to patients with nonerosive RA or healthy individuals [234].

8 Nipple aspirate fluid (NAF)

Nipple aspiration is a noninvasive technique for obtaining breast fluids from the duct openings of the nipple for the evaluation of abnormalities associated with breast cancer. NAF contains cells and many proteins (e.g., PSA). Therefore, the fluid can be tested for malignant cells and for diagnostic biomarkers of breast cancer. NAF production can be realized in more than 80% of the women at high-risk for breast cancer. The average total protein concentration in NAF is 71 μg/mL [433]. NAF proteome analysis holds promise as a non-invasive method to identify biomarkers for breast cancer diagnosis and prognosis.

SELDI-TOF-MS has been applied to rapid profiling of NAF samples and searching signature proteins associated with different stages of breast cancer [434–438]. However, the identity of these proteins is not known. There are concerns about the reproducibility and reliability of peak quantifications using SELDI-MS. Therefore, new algorithms have been introduced to address the quality control, peak detection, or quantification in SELID. It was found that data denoising with the undecimated discrete wavelet transform improved the reproducibility of quantifications and detected more peaks than the method implemented in Ciphergen software [436, 438]. A 2-DE/MALDI-TOF-MS approach was used to characterize NAF proteome and discover potential NSF markers for breast cancer. A total of 41 distinct proteins were identified, including 25 ones known to be secreted [439]. Three proteins, gross cystic disease fluid protein (GCDFP)-15, apo D, and A1AG, were proposed as putative biomarkers and further validation confirmed the significantly altered levels of GCDFP-15 (down-regulated) and A1AG (up-regulated) in breast cancer. Recently, a quantitative proteomic approach, based on ICAT, SDS-PAGE, and LC-MS/MS, was employed to identify differentially expressed proteins in NAF from the tumor-bearing and contralateral disease-free breasts of patients with unilateral early-stage breast cancer. A2HSG was found under-expressed whereas lipophilin B, beta-globin, hemopexin, and vitamin D-binding protein were overexpressed in NAF from tumor-bearing breasts [440].

9 Tear fluid (TF)

TF analysis is a noninvasive approach in early diagnosis and study of pathogenesis of eye-related diseases. It may also assist follow-up assessment of therapeutic treatment. For some eye diseases such as dry eye syndrome, the development of new potential treatments is hampered by the fact that there are no objective criteria available to precisely assess the treatment. A standard tear proteomic pattern from healthy individuals may serve as a reference to measure the success of treatment. Tear proteome profiling can also generate useful information for the understanding of the interaction between an eye and its contacting objects, such as a contact lens or a lens implant. This is important for designing improved eye-care devices and maintaining the health of an eye. TF analysis may be challenging because only a small volume of TF (<5 μL) can be collected in a clinical laboratory under normal operational conditions [441]. For some disease patients with minimum tear production, a saline “flushing” method can be used for collection of “eye flush fluid.” The content of TF can then be concentrated by ultracentrifugation for subsequent proteome analysis [442].

The most comprehensive cataloging of TF proteins was realized by using LC-ESI-MS/MS and LC-MALDI-MS/MS [441]. A total of 54 proteins were identified in less than 5 μL of TF, with 44 of them identified by LC-MALDI-MS/MS alone with a consumption of 2 μL TF. A truncated form of TF lacrimal PRP and its C-terminus peptide fragments were also characterized using SEC and MALDI-TOF-MS [443].

Two SELDI studies have demonstrated the potential of using tear proteomic pattern for the noninvasive diagnostic detection of eye diseases such as dry eye syndrome and SS [444, 445]. Multimarker models were established for the disease detection with superior specificity and sensitivity. The identification of biomarkers revealed an increase of inflammatory markers in patients with dry eye and a decrease of proteins that may have protective functions. Using 2-DE and LC-QTOF-MS, the differentially expressed proteins in TF samples between chronic blepharitis and control subjects were identified and also validated. Nine proteins, serum albumin, A1AT, lacritin, lysozyme, Ig-kappa chain VIII, pro-lactin inducible protein, cystatin-SA III, pyruvate kinase, and an unnamed protein, were found down-regulated in the blepharitis patients, providing further insights into the pathogenesis of the disease [446].

10 Amniotic fluid (AF)

The presence and integrity of AF is fundamental for the normal development of the human fetus during pregnancy. Its production rate changes throughout pregnancy and is mainly related to the functions of the different fetal, placental, and amniotic compartments. Monitoring the proteomic content in AF can be used to evaluate premature rupture of fetal membranes (PROM) and to develop biomarkers for early detection of amniotic infection or inflammation. Analysis and identification of AF proteins have been performed using Off-Gel IEF/LC-MS/MS [447] and LC-FT-ICR-MS [448]. Off-Gel IEF allows fractionation and recovery of proteins without the need of carrier ampholytes or buffers, representing an advantage as compared to classical liquid-IEF fractionation. The coupling of Off-Gel IEF with nano-LC-MS/MS resulted in the identification of a total of 69 proteins in AF.

PROM occurs in about 5% of deliveries, with complications such as infection and preterm birth. Early diagnosis is mandatory in order to decrease such complications. To identify potential AF markers (present only in AF but absent in maternal blood) for early detection of PROM, a 2-DE/MS approach was used to compare plasma and AF samples collected from women at terms. Two basement membrane-specific heparan sulfate proteoglycan proteins, perlecan and agrin, were found present in AF while they were absent from plasma samples of pregnant women [449, 450]. Presence of intra-amniotic infection (IAI) is often associated with preterm birth and adverse neonatal sequelae. Early diagnosis of IAI, however, has been hindered by insensitive or non-specific tests. By using SELDI-MS for AF profiling, a few putative markers, including human neutrophil protein (HNP) 1-3, calgranulin A, B, C, and insulin-like growth factor-binding protein 1, were found potentially useful for diagnosis of IAI [451–454]. MALDI-TOF-MS with functionalized magnetic beads was also reported for proteomic analysis of AF supernatant as a rapid detection of fetal aneuploidies immediately after amniocentesis [455].

11 Comparison of HBFPs

There are common features shared among different body fluid proteomes. Most of the body fluids contain a large number of proteins within an extremely wide dynamic range in concentration. Therefore, depletion of high-abundance proteins is often desired for these body fluids in order to achieve a comprehensive proteome analysis. Structural modification (e.g., glycosylation or truncation) of proteins is common for all human body fluids. These PTMs may be associated with disease states and therefore need to be characterized and quantified using appropriate proteomics platforms.

There is a minimum overlapping of proteins identified from different body fluids, suggesting that the proteomic content in each body fluid is distinct. This also implies that the analytical methods need to be optimized for a specific body fluid proteome, and there is no such universal proteomics platform for all the body fluids. The total protein concentration of body fluids varies much from serum as the highest to NAF as the lowest. The most abundant proteins also differ from one fluid to another. In saliva, amylase, PRPs, and Igs should be targeted for immunodepletion, whereas, in serum, albumins, Igs, A1AT, and HG should be depleted.

Human plasma/serum represents the most extensively studied body fluid in HBFP analysis. The insight gained from plasma/serum proteomic studies can help advance the analysis of other fluid proteomes. Compared to blood, other body fluids may express more sensitive and specific markers for certain local diseases. For example, saliva contains many locally produced proteins distinct from serum. These proteins may be better indicators of oral cavity diseases. Many studies also indicate that other body fluids (e.g., saliva, urine, and SF) contain a number of serum proteins, which could be an indication of blood contamination if those serum proteins are highly abundant. Transferrin may serve as a surrogate marker for quantification of blood contamination in these body fluids [456].

12 Concluding remarks

HBFP analysis is undoubtedly a long-lasting approach in the search for clinically relevant disease biomarkers. Tremendous efforts have been devoted to large-scale identification and quantification of proteins/peptides in human body fluids, especially in plasma/serum. A variety of technology platforms, including MS, electrophoresis, chromatography, and proteins arrays, have been evaluated for HBFP analysis. Multiplexed proteomics platforms are often required in order to achieve comprehensive analysis and identification of proteins present in a human body fluid.

In shotgun proteomic studies, database searching may produce a significant number of incorrect peptide assignments. The process of validating peptide assignments often relies on time-consuming manual verification and this is simply not practical for analysis of large LC-MS/MS datasets. Several statistical models, including probability-based ones, have been introduced for automated validation of peptide assignments to MS/MS spectra made by database search tools [457–460]. The use of probabilities as confidence measures to accompany peptide identifications allows high power to discriminate between correct and incorrect protein identifications and estimate false-positive error rates resulting from data filtering. Therefore, it provides a consistent and objective standard for analyzing and publishing large-scale protein identification datasets [459].

Development of standardized methods for sample collection, preparation, and handling is important for HBFP analysis. This requires the collaborative efforts from the community working on a specific body fluid. Certified reference materials may be developed for sample quality control and quality assurance. Immunoaffinity depletion appears to be an effective means to unmask low abundant proteins in body fluids. However, there has been concern about the use of immunoaffinity depletion because it may remove associated proteins (e.g., cytokines) as well. Whether immunodepletion is appropriate or not can be tested by measuring the changes in levels of a model biomarker protein in disease and normal subjects. A quantitative MS analysis based on MRM can be designed to verify if the biomarker protein shows consistent alterations between disease and normal before and after depletion.

Analysis and characterization of a body fluid proteome is a crucial step geared toward the subsequent biomarker discovery process. Knowing the protein concentration, PTM state and dynamic range will illustrate how best to design a quantitative MS approach for proteome profiling. Studying the variables affecting the production of protein content in a body fluid will introduce well-administrated proteomic samples for analysis. It is important to point out that the robustness and the stability of a “normal” body fluid proteome are rarely assessed. We do not know if the complex protein patterns are constantly changing as a result of minor physiologic events in the person, or if they are extremely stable even in the face of major biologic events [461]. Therefore, a well-characterized body fluid proteome will likely enhance the disease biomarker discovery process.

Although a large number of candidates from body fluids have been found literally for all major human diseases, some of them are common among different types of diseases. Questions were raised regarding sample preparation and handling, reproducibility and accuracy of proteomic technologies, biomarker validation and clinical utility testing, specificity and abundance of biomarker proteins, etc. These factors need to be carefully considered prior to planning for a proteomic biomarker study. In order to develop robust protein markers in body fluids, some general steps and procedures should be followed: (i) analysis and characterization of the body fluid proteome, (ii) differential proteomic profiling, data analysis (normalization and statistical analysis), and prioritization of identified markers, (iii) validation on the new independent sample cohorts, (iv) establishment of prediction models, and (v) clinical utility testing. In these steps, a reasonable sample size (patient group and matched controls) should be warranted and a comprehensive and quantitative profiling of the body fluid is the key. Statistics is indispensable for data analysis and model building. Reliability of the data also highly depends on proper normalization of the data and aggressive validation on large sample cohorts.

Technology improvement will have significant impact on the biomarker discovery study using proteomics. Unlike genomic profiling with microarray, current proteome analysis is far from comprehensive as evidenced by that the deepest portion of HBFP has not been probed by MS-based proteomics. Meanwhile, a majority of current proteomic technologies are not as rapid and high-throughput as microarray profiling. In order to meet the requirement for profiling a large number of clinical specimens, it is critical to advance the proteomics technologies toward a significantly higher level of comprehensiveness, throughput, speed, and accuracy. Quantitative proteome profiling is the key to unveil differentially expressed proteins associated with diseases. Isotope tagging/MS/MS and LC-MS in label-free format represent two promising methods for quantitative MS profiling of body fluids. Immunoassays are still extensively used for marker validation following proteomic discovery. However, these assays require a specific antibody or antigen to the protein of interest and therefore are not always available. Quantitative LC-MRM MS using synthetic isotope-labeled peptide as internal standard has been well demonstrated for protein quantification. This method may prove to be a powerful approach for the validation of biomarkers.

There has been concern that the putative biomarkers identified by proteomics approaches are not low abundant proteins. However, it is hard to define whether a disease marker should be low or highly abundant. Perhaps, a high priority should be placed on searching for cancer biomarkers with high specificity. The sharing of proteomic biomarkers among multiple types of cancers is not surprising, considering most of the published studies were on late stage cancer and healthy controls. Human cancers of epithelial origin may share similar molecular features at late stages. However, precancers or early-stage cancers are often localized and distinct. Therefore, we may expect more specific cancer bio-markers when the proteomics efforts are shifted to studying precancers or early-stage cancers. The discovered biomarkers will be truly meaningful for early detection of human cancers.

Biomarker discovery study also requires a robust statistical strategy for sample size calculation, data analysis, and prediction model building. Sample size in some published clinical proteomics studies tends to be small, presumably because the proteomics platform used is not capable of handling a large number of samples. However, this does not mean the sample size can be compromised. Determining sample size is an important issue because samples that are too small sized may lead to inaccurate results. In many cases, statistical tools can be used to estimate the minimum sample size needed, based on the random variations of proteins among proteomic samples and desired true-positive rate and false-negative rate [462]. A major difficulty confronting statistical prediction models building is the nature of multi-factorial etiology and heterogeneous pathogenic pathways in many human diseases such as cancers and systematic diseases. Multimarker prediction models are often built in order to enhance the prediction power [463, 464].

Despite these challenges, body fluid proteomics is still one of the most promising approaches to disease biomarker study. The clinical proteomic community is expected to accept the guidelines and protocols set up by the disease diagnosis (especially cancer diagnosis) community. These guidelines should be used to rationalize and standardize the process of proteomic biomarker development. We are enthusiastic about the new breakthroughs in the near future with the fast development of novel MS and proteomics technologies. The new insight unveiled by HBFP analysis will improve patient care and public health through better assessment of disease susceptibility, prevention of disease, and monitoring of treatment response [465].