TRAINING. PROGRAM-UNIVERSITY ..... of cysts followed by repair may underlie the interstitial fibro- sis. Alternatively, an as-yet-unidentified .... an auto- somal recessive syndrome characterized by cystic deterioration of the kidneys and death.

Abstract Background: Autosomal dominant polycystic kidney disease (ADPKD) is responsible for 10% of cases of the end stage renal disease. Early diagnosis, especially of potential fast progressors would be of benefit for efficient planning of therapy. Urine excreted proteome has become a promising field of the search for marker patterns of renal diseases including ADPKD. Up to now however, only the low molecular weight fraction of ADPKD proteomic fingerprint was studied. The aim of our study was to characterize the higher molecular weight fraction of urinary proteome of ADPKD population in comparison to healthy controls as a part of a general effort aiming at exhaustive characterization of human urine proteome in health and disease, preceding establishment of clinically useful disease marker panel. Results: We have analyzed the protein composition of urine retentate (>10 kDa cutoff) from 30 ADPKD patients and an appropriate healthy control group by means of a gel-free relative quantitation of a set of more than 1400 proteins. We have identified an ADPKD-characteristic footprint of 155 proteins significantly up- or downrepresented in the urine of ADPKD patients. We have found changes in proteins of complement system, apolipoproteins, serpins, several growth factors in addition to known collagens and extracellular matrix components. For a subset of these proteins we have confirmed the results using an alternative analytical technique. Conclusions: Obtained results provide basis for further characterization of pathomechanism underlying the observed differences and establishing the proteomic prognostic marker panel. Keywords: ADPKD, Urine proteome, Differential proteomics, Mass spectrometry, iTRAQ, MRM

Background Autosomal dominant polycystic kidney disease (ADPKD) is an inherited disorder affecting 1 in 1000 people and responsible for 10% of cases of the end stage renal disease (ESRD). Apart from renal manifestations, changes in other organs may be present, including a.o. liver cysts and intracranial aneurysms. The disease is divided into 2 types based on mutated gene (PKD1 in type 1 - 85% of cases, and PKD2 in type 2). The type of the mutation has prognostic significance, as the average age of ESRD

depends on the type of the disease and amounts to 53 years in type 1, and 69 years in type 2 [1]. As potential therapeutic methods for ADPKD are extensively tested in clinical trials [2-5], there is need for tools which enable early diagnosis and monitoring of therapy, especially non-invasive tests which would substitute kidney biopsy. Evaluation of changes in the peptidome and/or proteome may provide required information of pathophysiologic and clinical significance and may allow to establish future diagnostic or prognostic tools [6]. Urine, as well-accessible compartment, seems to be an ideal material for the search of a noninvasive prognostic and therapy monitoring tests in case of renal diseases. However, before urine proteome or peptidome markers become clinically useful, the urine proteome itself must be thoroughly characterized in a

process of intense multi-stage research comparing different sample processing and analysis experimental laboratory settings. The aim of our research was to apply an in-depth proteomic bottom-up methodology to characterize the urinary proteome of ADPKD population in comparison to healthy controls. Literature data concerning descriptive proteomics in ADPKD patients are limited. Mason et al. reported the proteomic analysis of four samples of cyst fluid obtained postoperatively from excised kidneys in patients with ESRD due to ADPKD [7]. Kistler et al. [8] were the first who attempted to identify the urinary biomarker profile of ADPKD, focusing on the low molecular (10 kDa cutoff filters, was used for the analysis. This allowed to normalise the sample set with respect to different levels of dilution of proteome in each sample and to compare the proteome composition. After tryptic digestion peptides were subjected to iTRAQ labeling and IEF separation yielding 26 fractions, each analysed in a separate LC-MS-MS/MS run. IEF separation substantially increases the final protein coverage. However, the separate analysis of 60 samples including IEF step would require more than 1500 LC-MS-MS/MS runs which is not practical. To overcome this difficulty and retain an in depth insight into urine proteome we have used a partial pooling strategy. A set of 30 ADPKD samples was divided into 3 subsets, containing 10 samples each, which were pooled into three Disease Pooled

Page 2 of 13

Samples (DPS I, II and III). Similarly, control set was divided into three Control Pooled Samples (CPS I, II and III) retaining age and sex matching within the subsets. In addition two technical replicates of each DPS or CPS was prepared, further denoted A or B to assess the intragroup technical variability. In result 4-plex iTRAQ labeled peptides from three replicates of pooled control and ADPKD samples, each of them represented by two technical replicates, were analyzed during IEF-LC-MS-MS/MS analysis of three IEF strips, as described in Methods section and Figure 1. For each of IEF strips the two pairs of control pooled samples, for instance CPS IA, and CPS IIA, and two disease pooled samples, for instance DPS IA and DPS IIA were mixed and subjected to IEF separation. IEF strips were cut into ca. 26 sections. Labeled peptides were eluted from each of the IEF strip sections and subjected to separate LC-MS-MS/MS runs. In result of qualitative analysis (peptide and protein identification) in each of the three IEF-LC-MS-MS/MS experiments 1327/1353/ 1582 proteins, respectively, were identified, each represented by more than two peptides, as shown in Table 1 and Figure 2. One-peptide hits were not taken into account in further quantitative analysis. Qualitative results (protein lists) from three IEF-LCMS-MS/MS experiments were combined, resulting in a dataset with all 1700 proteins identified by at least two peptides. Within this dataset protein identifications based on identical peptide sets were again grouped and each group was treated as a single protein cluster in further processing. Quantitative analysis was performed, as described in Methods section, with proteins represented by two or more peptides for which it was possible to calculate a protein ratio in at least one of IEF-LC-MS-MS/ MS experiments. The final combined protein list accepted for quantitation contained 1413 proteins. 1090 out of

Table 1 Number of identified peptides and proteins in three replicates of iTRAQ experiment on pooled samples Replicate (IEF gel strip)

Number of peptides (accepted PSM’s)

Number of proteins (proteins ≥ 2 peptides)

1

9530

2430 (1327)

2

9814

2637 (1353)

3

11329

2810 (1582)

these proteins are common for all replicates of the experiment. The statistical analysis of the quantitative results of the three IEF-LC-MS-MS/MS experiments revealed 155 proteins that were differently populated (with q < 0.05) in the urine of ADPKD patients as compared to healthy controls. 148 of them were identified in each of the IEF-LC-MS-MS/MS experiment, 7 – in two replicates. The Differential Protein List (DPL) is presented in Table 2. The differences in protein levels (protein ratio) can be substantial, exceeding 5-fold in some cases. Among DP’s, 103 proteins were downregulated, and 52 were upregulated in ADPKD. Principal Component Analysis of the results of this experiment (Figure 3) shows a very good separation of the two study groups along the first component axis. DPL was obtained as a result of pooling experiment and this approach allowed for in-depth (>1000 proteins) quantitative analysis of urine proteome. However, upon

Figure 2 Results of qualitative analysis – a Venn diagram representing the number of proteins identified by two or more peptides in three biological replicates of iTRAQ experiment – three IEF gel strips. 1090 proteins are common in all three experiments.

Page 3 of 13

pooling the levels of proteins are averaged and the information on the variability of the amount of the protein among individual samples is lost. Therefore, to test the pooling experiment results using an alternative analytical approach (Multiple Reaction Monitoring - MRM), we have carried out the analysis of individual ADPKD and control samples for a subset of proteins from DPL. For this purpose a new set of samples (27 ADPKD vs. 25 healthy controls) was collected. Initially, a subset of 17 proteins from DPL, represented by the largest number of peptides was selected for MRM analysis. The number of proteins for MRM experiment is limited by the number of peptides that can be analysed in parallel in a single experiment. For these proteins their natural abundance peptides were searched for in urine control samples. Satisfactory results were obtained for 9 (represented by 14 peptides) out of 17 proteins, due to insufficient sensitivity for 8 remaining proteins. Next, 14 stable isotopically labeled (SIS internal standards) peptides were synthesized. Using SIS peptides the MS parameters for MRM experiment were optimised for each peptide. Comparison of the results of the MRM quantitation with the results of iTRAQ pooling experiment for these 9 proteins is shown in Table 3. For 8 proteins their upregulation in ADPKD was in agreement with the results of the pooling experiment, however for one protein (Cystatin-M) the q-value (0.13) exceeded the threshold of 0.05 making this result insignificant. For still another protein (Proactivator polypeptide) MRM results for the representing peptide EIVDSYLPVILDIIK indicate its smaller level in ADPKD whereas in pooling experiment the level averaged over 19 peptides was larger in ADPKD. This result is difficult to explain since the same peptide EIVDSYLPVILDIIK in iTRAQ pooling experiment shows increased level in ADPKD, so for this protein MRM does not confirm results from pooling analysis. However, for 8 out of 9 proteins the results of both approaches are in full qualitative agreement. On the quantitative level the agreement between the two methods in the case of majority of proteins is good, only for 2 proteins the ratio differences are larger (for Retinol binding protein (RBP) ratio 4.6 for MRM and 2.65 for iTRAQ). It has to be taken into account that the ratios are calculated in both methods using a different set of peptides, usually much larger for iTRAQ. These peptides may represent different regions of protein sequence and some of them may originate from proteolytic protein fragments, quite probable in urine proteome and not from intact proteins, which may justify the observed differences on quantitative level. An alternative explanation in case of RBP comes from higher variability level of this protein within ADPKD group, as illustrated in Figure 4. It shows that the upregulation of an average RBP level in ADPKD originates from a subset (6 samples out of 27) of

Table 2 Differential Protein List. Proteins of different level in the urine of ADPKD patients compared to healthy controls. Ratio is given as ADPKD/Control (Continued) 141

Q08380

0.03793

0.73

41

142

O75487

0.0383

0.72

9

Glypican-4

143

O75339

0.04059

0.49

5

Cartilage intermediate layer protein 1

144

P02649

0.04059

0.69

18

Apolipoprotein E

145

P34059

0.04062

0.73

17

N-acetylgalactosamine-6-sulfatase

146

P35318

0.04219

0.49

4

ADM

147

Q12794

0.04254

0.58

7

Hyaluronidase-1

148

A9Z1Y9

0.04254

2.01

3

Thymosin beta-4-like protein 6

149

O14773

0.04437

0.69

22

Tripeptidyl-peptidase 1

150

O75015

0.04588

0.6

9

Low affinity immunoglobulin gamma Fc region receptor III-B

151

P30711

0.04618

0.4

2

Glutathione S-transferase theta-1

152

Q14894

0.04647

0.62

6

Mu-crystallin homolog

153

O43895

0.04733

0.68

18

Xaa-Pro aminopeptidase 2

154

Q9NQS3

0.04733

0.59

10

Poliovirus receptor-related protein 3

155

P04792

0.04848

0.59

7

Heat shock protein beta-1

ADPKD samples in which the level of the protein is much larger (even by a factor of 25) than in remaining ADPKD samples, for which the levels are similar to control. Thus the average value in pooling experiment might easily be shifted by a single sample of exceptionally large content of RBP. Interestingly, the RBP levels correlate strongly with the progressor status of the patient, as illustrated by asterisks in Figure 4. This effect however requires further studies.

Galectin-3-binding protein

Discussion Urine proteome is thought to contain renal disease fingerprints, but the pathology-related urine proteomics is still in its infancy. For ADPKD one study [8] was published in which a low molecular weight proteome fraction was studied and a set of potential disease markers was proposed. However, the most successful approach of global proteomic analyses of the total proteome, combining multiple steps of separation preceding quantitative

Figure 3 Principal Component Analysis of the three pooled biological replicates of control (CPS I, II, and III) and disease (DPS I, II and III) samples based on 155 proteins (grey dots) indicated in the analysis as differentially populated between control and ADPKD samples. The analysis shows a good separation of control samples from ADPKD samples along the first component axis. Note a high similarity of control samples. Normalized heights of iTRAQ peptide signals (averaged over the corresponding protein cluster) were used as features in the PCA analysis. Values of each biological replicate is an average over the two technical replicates.

Table 3 Comparison of the results of the MRM quantitation with the results of iTRAQ pooling experiment for 9 proteins. Protein ratios along with q-values are given PROTEIN

MRM

iTRAQ

ratio

p-value

ratio

q-value

peptides

1.

Antithrombin-III

3.56

0.0001

2.01

0.00003

28

2.

Apolipoprotein A-IV

3.65

0.001

4.19

0.00003

39

3.

Complement C3

2.54

0.001

2.45

0.00003

94

4.

Histidine-rich glycoprotein

1.79

0.002

1.81

0.00008

14

5.

Proactivator polypeptide

0.55

0.002

2.74

0.00003

23

6.

Myocilin

1.99

0.007

2.39

0.00134

8

7.

Retinol-binding protein 4

4.6

0.015

2.65

0.00003

11

8.

Transthyretin

2.87

0.025

1.88

0.00008

11

9.

Cystatin-M

1.64

0.129

1.85

0.00399

10

mass spectrometry was not yet carried out for ADPKD urine samples. To fill this gap, in our approach we have combined iTRAQ based quantitation with peptide isoelectrofocusing and reversed phase separation coupled with MS to obtain an in-depth urine proteome coverage of quantitative analysis of ADPKD vs. control sample set. Qualitative analysis – combined from three IEF-LCMS-MS/MS experiments peptide identification brought a list of 14429 peptides assigned to proteins, corresponding to 1700 proteins, each identified by at least two peptides (Additional file 1). The median number of peptides per protein was 9.34. This list compares well with other attempts of qualitative characterization of human urine proteome in which the overall number of proteins depends strongly on the number of peptide/protein prefractionation steps used. 808 proteins were detected when

the only separation step was LC preceding MS [9]. Adding 1D SDS PAGE separation step increased this number to 1102 [10] or 1543 [11] proteins represented by at least two peptides. Application of multidimensional separation strategy was shown to yield 2362 proteins [12], but the other group reports only 991 proteins [13]. Pairwise comparison of common proteins detected in our work yields 972 common proteins with Adachi [11], and 975 with 1823 proteins (including one-peptide hits) found by Li [13]. The number of common proteins detected in three publications [10,11,13] was compared in Figure 2 in Marimuthu's paper [10] yielding 658 common proteins of which 582 were detected in our work. This number correlates well with 587 proteins named “core urinary proteins” commonly detected in a large set of urine samples [9]. In conclusion our dataset represents very

Figure 4 Retinol-binding protein (RBP) levels as measured by MRM technique in a set of 27 ADPKD/25 healthy control samples. Note large differences in protein levels within ADPKD group and correlation of its high levels with progressor status of the patient (denoted by asterisks).

well core urinary proteins, however the number of unique proteins found in this work is also high, indicating that the urine proteome complexity is far from being explored in-depth. In a quantitative analysis a list of proteins (DPL) differentiating ADPKD vs. healthy control samples has been established. The partial pooling experiment indicated a list of 155 proteins of different level in the urine of ADPKD patients compared to healthy subjects. We have found alterations in the complement system, apolipoproteins, group of serine protease inhibitors, several growth factors, collagen chains, extracellular matrix components, transmembrane proteins, and many others. Many of them have never been linked to ADPKD in previous studies. Additionally, our results confirm the alterations observed in animal models, concerning, for example, apolipoproteins [14]. Some proteins included in DPL have previously been linked to the progression of cystic kidney disease, for example CD14 molecule [15]. In our study the application of a pre-separation of peptides by IEF and the analysis of 26 fractions of each gel allowed to greatly increase the number of proteins that could be subjected to quantitation. However, each IEF-LC-MS-MS/MS experiment required 26 LC-MS-MS/ MS runs corresponding to 78 hours of spectrometer time, so it could not be carried out separately for 60 samples due to exceedingly long time of the analysis required (4500 hours, nearly 200 days of spectrometer time would be required). This justified the pooling approach which combined the information contained in all samples and allowed its in-depth analysis in a reasonable time. However, when the protein ratios are compared after pooling the information on the scatter of protein ratios among the individual, pooled samples is lost, and the statistical validity of obtained differences cannot be properly assessed. For that reason we have used MRM technique for a subset of nine DPL proteins, which confirmed the results of the pooling experiment, only for one protein the confirmatory analysis was not successful. In general the differential list obtained from pooling experiment is thus a candidate list, each protein of interest from the list has to be measured in individual samples in a separate experiment by an independent method. Only a few cases of proteomic analysis of ADPKD tissue samples can be found in the literature. Mason et al. reported the proteomic analysis of four samples of cyst fluid obtained postoperatively from excised kidneys in patients with ESRD due to ADPKD [7]. The authors identified 44 proteins that were found in at least two cysts and might be of mechanistic or diagnostic interest in ADPKD. Similarly to our results, the list of these proteins included complement factors, apolipoprotein A-I,

Page 9 of 13

pigment epithelium-derived factor (PEDF) and others. However, the potential diagnostic utility of cyst fluid proteomics is highly limited, and in our opinion, it is the urine that may become the diagnostic material in clinical practice. Kistler et al. were the first who attempted to identify the urinary biomarker profile of ADPKD [8]. Due to application of CE-MS technology the range of molecular masses under study was thus limited to less than 15 kDa, whereas in our work proteins of masses larger than 10 kDa were studied. This explains the differences in the lists of differentiating proteins which in case of Kistler et al. were limited mainly to collagen fragments and uromodulin peptides. Therefore, our DPL may be regarded as a complete list of ADPKD-specific urinary proteins, independent on kidney function. Our results provide the first step of the analysis, specific DPL proteins of interest should be now verified by a targeted analysis on non-pooled samples on much wider sample sets. Moreover, the specificity of these results should be determined in studies including patients with chronic kidney disease of distinct origin. Additionally, it should be determined whether the type of mutation (PKD1 or PKD2) impacts the proteome. Finally, methods of sample collection and preparation, laboratory procedures, and data analysis must be optimized. After verification, our results may in future serve as a basis for mechanistic studies and, therefore, may ultimately lead to discovery of new therapeutic targets in ADPKD. Additionally, the set of urinary biomarkers may be used in the future for early diagnosis of ADPKD.

Conclusions The urine proteome of ADPKD patients differs significantly from the urine proteome of healthy subjects and may become the clinical tool used for early diagnosis of ADPKD. The pathophysiological informations obtained in presented study may become a basis for the development of new therapies. Methods Urine samples

Thirty ADPKD patients diagnosed with abdominal ultrasound [16] were enrolled into the study group. The control group consisted of 30 healthy volunteers matched according to the sex and age. The demographic data of both groups are summarized in Table 4. The inclusion criteria for the study group were the diagnosis of ADPKD and age ≥18 years. The inclusion criteria for the control group included: absence of ADPKD, age ≥18 years, and body mass index (BMI) between 21 and 26. The exclusion criteria for both groups included especially: current infection of urinary tract, macroscopic hematuria,

Table 4 Demographic characteristics and renal function of study and control group Study group

Control group

n

30

30

male/female (%)

12 (40%)/18 (60%)

12 (40%)/18 (60%)

mean age in years (range)

44.4 (20–72)

44.6 (20–76)

mean body mass in kg (range)

70.7 (50–100)

71.1 (50–91)

mean serum creatinine in μmol/l (range)

120.7 (38.1-388.9)

68.4 (45.8-114.4)

GFR (CKD-EPI formula) in ml/min (range)

66.8 (11–140)

102.2 (54–136)

diabetes mellitus, malignancy of urinary tract or generalized malignancy of other system, and status post organ transplantation. The study protocol was approved by the local ethics committee. Informed consent was obtained from all participants. The study was performed in accordance with the Declaration of Helsinki Principles. Urine collection

Samples were collected from 30 patients and 30 healthy donors using a uniform protocol. The second or thirdmorning mid-stream urine was collected from all participants at a time of 1 and 3 hours after previous micturition. Sterile urine containers were used for the collection of samples. pH of the samples was stabilized at 7.2 by addition of 1/10th vol. of 1 M HEPES pH 7.2 immediately after collection. Further sample preparation steps were carried out within 1 hour after collection during which the sample was kept at room temperature. Samples were vortexed for 2 minutes, centrifuged (3000xg, room temp.) for 10 minutes to clear the debris, filtered through the 0.4 μm filter (Rotilabo-Spritzenfilter, P819.1, Roth) and portioned into 1 ml aliquots, to avoid freeze/thaw cycles in repeated experiments of the same sample. Sample aliquots were stored at −80°C for further use. The protocol used follows the urine proteomic sample collection recommendations [17]. Sample filtration

10 kDa cutoff membrane filters (Amicon Ultra-0.5, UFC501096, Millipore) were washed twice with MilliQ water prior to use. Urine was centrifuged through the membrane at 14000xg for 15 minutes. Next, 500 μl MQ was added to the retentate and centrifugation step was repeated. To recover the concentrated and desalted sample, the filter was placed upside down in a clean micro centrifuge tube and centrifuged for 2 minutes at 1000xg. The protein concentration was measured by the Bradford method. Aliquots of samples were stored at −80°C.

Page 10 of 13

Pooling samples and iTRAQ-labelled samples study design

When indicated, the aliquots (corresponding to 10 μg of protein) of 10 urine samples were pooled. Only samples from a single study group (disease or control) were pooled. 30 control (healthy) samples were divided into three control pooled samples (CPS’s I, II and III) and similarly, 30 ADPKD samples were divided into three disease pooled samples (DPS’s I, II and III). Age and sex matching was preserved within the three pairs of pooled sample groups. Three CPS’s and three DPS’s were obtained in two technical replicates (marked A and B) each, making a set of 12 pooled samples to be compared after iTRAQ labeling. As 4-plex iTRAQ was used, 2 CPS and 2 DPS samples were compared in one LC-MS/MS experiment. To analyze 12 samples we have carried out a set of 3 independent LC-MS/MS experiments. The study design is illustrated in Figure 1. iTRAQ labeling

Before labeling, protein aliquots were evaporated to dryness in a speedvac, dissolved in 20 μl Dissolution Buffer with 0.1% SDS, reduced with TCEP, cysteine-blocked with MMTS (reagents were provided with the iTRAQ kit from Applied Biosystems), and digested overnight with trypsin (Promega). The CPS and DPS samples were differentially labeled with one of the four iTRAQ tags (114, 115 for CPS samples and 116, 117 for DPS samples) for 1 h according to the iTRAQ manufacturer’s protocol. Next, the reaction was quenched by adding 100 μl H2O. For each of the three LC-MS/MS experiments 2 CPS and 2 DPS iTRAQ-labeled samples were combined and 340 μl buffer was added [8 M urea, 0.2% IPG buffer pH 3–11 NL (GE Healthcare), 0.002% bromophenol blue in 50 mM Tris–HCl, pH 8.0]. The solution was applied to 18 cm IPG strip with 3–11 NL pH gradients (GE Healthcare) for isoelectrofocusing (IEF): 340 μl of sample/strip, corresponding to 400 μg protein. The IPG strip was rehydrated overnight in an IPG box (GE Healthcare). The next day, the strips were isoelectrofocused using a Ettan IPGphor 3 electrophoresis system (GE Healthcare) as follows. Two steps of electrophoresis were used. The first step consisted of a 5 h pre-run at 500 V. During this step, the conductivity decreases, and salts and other highly conductive compounds move towards the electrode (anode). Second, a long gradient focusing program was used: 1 h at 500 V, 9 h at 1000 V and 30 h at 8000 V (the final current was 5 μA). After focusing, the strip was removed from the tray and the overlay oil was blotted with a paper tissue. Strip was wrapped in a parafilm and stored at −80°C. The strip was placed on a tray cooled with dry ice and cut into sections of ca. 7 mm. The sections were transferred into individual 1.5-ml siliconized Eppendorf tubes. In all,

the 18-cm long gel strips were sliced into 26 sections. Peptides were extracted from gel sections by two cycles of adding 60 μl 0.1%TFA, 2% acetonitrile and vortexing the tubes for 40 minutes at room temperature. Aliquots with extracted peptides were stored at −80°C for LC-MS/ MS analysis. Mass spectrometry - LC-MS/MS settings

The peptide mixture (20 μl) was applied to the nanoACQUITY UPLC Trapping Column (Waters) using water containing 0.1% formic acid as the mobile phase and then transferred to the nanoACQUITY UPLC BEH C18 Column (Waters, 75 μm inner diameter; 250-mm long) using an acetonitrile gradient (3–33% acetonitrile over 150 minutes) in the presence of 0.1% formic acid with a flow rate of 250 nl/min. The column outlet was directly coupled to the electrospray ion source of the LTQ-Orbitrap Velos mass spectrometer (Thermo Scientific) working in the regime of data-dependent MS to MS/MS switch. HCD fragmentation was used. Other Orbitrap parameters were as follows: one MS scan was followed by max. 5 MS/MS scans, capillary voltage was 1,5 kV, data were acquired in positive polarity mode. Mass spectrometry - Qualitative MS/MS data processing

The acquired MS/MS data were pre-processed with Mascot Distiller (version 2.3.2.0, Matrix Science, London, UK). The database search of the data using MASCOT search engine was carried out in a three-step procedure (described elsewhere [18], and in short in Additional file 2) to calculate MS and MS/MS measurements errors and to recalibrate the data for the repeated MASCOT search. The initial search parameters were set as follows: enzyme, semi-trypsin; fixed modification, cysteine modification by MMTS as well as iTRAQ labeling of the N-terminus of peptides and of lysine side chains; variable modifications - oxidation (M); max missed cleavages – 1, Swiss-Prot database with the taxonomy restricted to Homo sapiens (20273 sequences). For the repeated search the recalibrated data from all gel sections were merged into one input file and searched using MASCOT against a Swiss-Prot database supplemented with the decoy database to obtain the statistical assessment of the identification of each peptide by a joined target/decoy database search strategy [19]. This procedure provided q-value estimates for each peptide spectrum match (PSM) in the dataset. All PSMs with q-values > 0.01 were removed from further analysis. A protein was regarded as confidently identified if at least two peptides of this protein were found. Proteins identified by a subset of peptides from another protein were excluded from analysis. Proteins that exactly matched the same set of peptides were clustered into one group/cluster. MS/MS spectra of peptides meeting the above acceptance

Page 11 of 13

criteria were subjected to quantitative analysis step to obtain a list (Differential Protein List) of proteins differentially populated between a set of three CPS’s and three DPS’s. iTRAQ quantitative analysis

For protein quantitation only unique peptides (i.e. peptides belonging only to one protein/cluster) were included. In the first step, using MascotDistiller program iTRAQ reporter ion peaks were detected in the preprocessed MS/MS spectra; next, their intensities were corrected for isotope impurity using the information provided by the reagent manufacturer. For each spectrum a geometric mean of two reporter ion intensities belonging to one study group (CPS or DPS) were separately calculated. A ratio of these mean values (CPS mean divided by DPS mean) was reported as peptide ratio. If more than one spectrum was obtained for a peptide in a single LC-MS/MS experiment, median peptide ratio value from all spectra was used. Prior to the protein ratio calculations, peptide ratios were median-normalized to remove systematic bias. Proteins ratios were calculated as the median ratio of their peptide’s ratios. The statistical significance of a single protein ratio was assessed by an in house program Diffprot [20]. In this program the statistical validity of regulation/expression status of the protein represented by its calculated protein ratio is based solely on the statistical analysis of the set of all MS/MS datasets from a given experiment, without assumptions on the character of the distribution of peptide ratios in a dataset (e.g. its normality). In brief, the probability of obtaining a given protein ratio by a random selection from the dataset is tested by multiple rounds of protein ratio calculation for a large number of permuted decoy datasets in which the peptide-protein assignment has been scrambled. Calculated p-values were adjusted for multiple testing using a FDR-controlling procedure, yielding protein ratio q-values reported in Table 2.

We have selected a subset of proteins from the Differential Protein List shown in Table 2 for further analysis of non-pooled, individual samples using the multiple reaction monitoring (MRM) technique, used in conjunction with stable-isotope-labeled peptide standards (SIS). The presence of natural MRM transitions for peptides from 17 proteins was first checked in samples of urine collected separately from healthy volunteers. Only for nine proteins the natural transitions corresponding to selected peptides yielded satisfactory results and SIS peptides were generated for these. The transitions for peptides corresponding to the remaining eight

peptide by injecting 1 pmol (in 0.1% formic acid) oncolumn and ramping the cone voltage from 20 to 70 V in 5 V steps while gating all the possible parent ion charge states (2+, 3+, 4+) using the selected ion recording (SIR) function controlled by the Waters MassLynx V4.1 software. The daughter ions generating the highest possible signal and their individual, optimal collision energy (CE) voltages were determined empirically by injecting 1 pmol (in 0.1% formic acid) of SIS peptide on-column and ramping the CE voltage up and down five 2 V steps from that suggested by the Skyline Ver. 1.3 software (University of Washington, MacCoss Lab, Department of Genome Sciences, UW) for the Waters Xevo instrument. All possible b- and y-series fragment ions for both 2+ and 3+ precursor ion charge states spanning a m/z range from 300 to 1500 were tested. MRM scans for optimization of MRM Q1/Q3 ion pairs were conducted with the optimized cone voltages with the Span setting set to 0 and with dwell times of 10 milliseconds for each transition. From this data, using the Skyline Ver. 1.3 software, the 5 transitions that produced the strongest signals were selected on a per-peptide basis, with a preference toward higher-mass y series ions if the abundances were similar. These top 5 transitions were then checked for signal interferences when present in a sample-digest background. The SIS peptide mix was analyzed by LC-MRM/ MS using transitions for heavy (SIS) and natural (endogenous) peptides, both in buffer and in a sample digest. Identical MRM acquisition parameters were used for the heavy and natural forms of each peptide, while taking into account the Q1/Q3 mass differences due to the stable-isotope label. The transitions that maintained the same relative intensities in both the buffer and sample were considered as interference free. This analysis is also used to determine the retention time as well as confirm the identity of the ion signals observed for natural and heavy peptides, thus verifying the identity of the natural peptides which co-elute with the corresponding SIS peptides. MRM analysis was performed on a new set of 52 samples (27 ADPKD vs. 25 healthy controls), with an injection volume of 4 μl resulting in 2 μg of protein digest on-column. Samples were prepared basically as for iTRAQ pooling analysis, see Sample filtration section. First, from each sample, an aliquot of protein fraction containing 10 μg of total protein was transferred to silanized vials. Then the volume of each sample was brought to 30 μl using 100 mM solution of NH4HCO3. 100 mM DTT (Sigma Cat no D8161-5 G) was added to the samples to the final concentration of 10 mM and incubated at 56°C for 40 min. To block reduced cysteines 0.5 M iodoacetamide (Sigma Cat no I1149-5 G) to the final concentration of 50 mM was used and the sample was incubated at room temperature for 30 minutes in

darkness. Trypsin (Promega cat. no V511A) was added to samples in 1:20 vol./vol. ratio and incubated at 37°C overnight. Finally, trifluoroacetic acid was added to digested protein samples to reduce pH to 2 and inactivate trypsin. Peptide standards were added to the samples post digestion as a SIS mixture in which individual SIS peptides were balanced to obtain at least a ratio of 1:10 between the endogenous natural peptide and the corresponding SIS peptide in a positive sample. All MRM data was processed using the Skyline Ver. 1.3 software with default values for peak integration and SavitzkyGolay peak smoothing. All integrated peaks were manually inspected to ensure correct peak detection and accurate integration. All peptides were targeted using 5 MRM ion pairs per peptide unless an interference was found in a transition then reducing that number to four transitions per peptide. The integrated peak areas for the individual transitions detecting the 4–5 ion fragments per peptide were summed. The relative protein amounts in the samples are reported as Peak Area Ratios To Heavy, which refers to the ratio of the integrated area of the endogenous (natural) peak to the integrated area of the corresponding standard (SIS) peptide.