Figures

Abstract

Urine has long been a “favored” biofluid among metabolomics researchers. It is sterile, easy-to-obtain in large volumes, largely free from interfering proteins or lipids and chemically complex. However, this chemical complexity has also made urine a particularly difficult substrate to fully understand. As a biological waste material, urine typically contains metabolic breakdown products from a wide range of foods, drinks, drugs, environmental contaminants, endogenous waste metabolites and bacterial by-products. Many of these compounds are poorly characterized and poorly understood. In an effort to improve our understanding of this biofluid we have undertaken a comprehensive, quantitative, metabolome-wide characterization of human urine. This involved both computer-aided literature mining and comprehensive, quantitative experimental assessment/validation. The experimental portion employed NMR spectroscopy, gas chromatography mass spectrometry (GC-MS), direct flow injection mass spectrometry (DFI/LC-MS/MS), inductively coupled plasma mass spectrometry (ICP-MS) and high performance liquid chromatography (HPLC) experiments performed on multiple human urine samples. This multi-platform metabolomic analysis allowed us to identify 445 and quantify 378 unique urine metabolites or metabolite species. The different analytical platforms were able to identify (quantify) a total of: 209 (209) by NMR, 179 (85) by GC-MS, 127 (127) by DFI/LC-MS/MS, 40 (40) by ICP-MS and 10 (10) by HPLC. Our use of multiple metabolomics platforms and technologies allowed us to identify several previously unknown urine metabolites and to substantially enhance the level of metabolome coverage. It also allowed us to critically assess the relative strengths and weaknesses of different platforms or technologies. The literature review led to the identification and annotation of another 2206 urinary compounds and was used to help guide the subsequent experimental studies. An online database containing the complete set of 2651 confirmed human urine metabolite species, their structures (3079 in total), concentrations, related literature references and links to their known disease associations are freely available at http://www.urinemetabolome.ca.

Funding: Funding for this research has been provided by Genome Canada, Genome Alberta, The Canadian Institutes of Health Research, Alberta Innovates BioSolutions, Alberta Innovates Health Solutions, The National Research Council, The National Institute of Nanotechnology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: Ralf Bogumil and Cornelia Roehring are employed by BIOCRATES Life Sciences AG. This company produces kits for targeted metabolomic analyses. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.

Introduction

Metabolomics is a relatively young branch of “omics” science concerned with the systematic study of the chemical products or metabolites that cells and organisms generate. Because metabolites are the downstream products of numerous genome-wide or proteome-wide interactions, the metabolome (the sum of all metabolites in an organism) can be a very sensitive measure of an organism’s phenotype. This fact has made metabolomics particularly useful in the study of environment-gene interactions [1], [2], [3], [4], the identification of disease biomarkers [5], [6], [7], [8], [9] and the discovery of drugs [10]. Unlike its older “omics” cousins, where complete or near-complete coverage of the genome or proteome is fairly routine, metabolomics still struggles to cover even a tiny fraction of the metabolome. Indeed, most human metabolomic studies published today, even those exploiting the latest and most sensitive LC-MS/MS technologies, typically succeed in identifying or characterizing fewer than 100 compounds [11], [12], [13], [14]. This corresponds to less than 1% of the known human metabolome [15], [16]. In an effort to help improve this situation, we (and others) have started to undertake the systematic characterization of various human biofluid metabolomes. This includes the human cerebrospinal fluid metabolome [17], [18], the human saliva metabolome [19], and the human serum metabolome [20]. We have now turned our attention to characterizing the human urine metabolome.

Urine, as produced by mammals, is a transparent, sterile, amber-colored fluid generated by the kidneys. The kidneys extract the soluble wastes from the bloodstream, as well as excess water, sugars, and a variety of other compounds. The resulting urine contains high concentrations of urea (from amino acid metabolism), inorganic salts (chloride, sodium, and potassium), creatinine, ammonia, organic acids, various water-soluble toxins and pigmented products of hemoglobin breakdown, including urobilin, which gives urine its characteristic color. Urination is the primary route by which the body eliminates water-soluble waste products. The average adult generates between 1.5–2.0 liters of urine per day, which over the course of their lifetime would be enough to fill a small backyard swimming pool (5 X 8 X 1.5 m).

While largely viewed as a waste product, urine has considerable value as a diagnostic biofluid. Indeed the analysis of urine for medical purposes dates back to ancient Egypt [21], [22], [23], [24]. Hippocrates largely legitimized the medical practice of uroscopy (the study of urine for medical diagnostics) where examination of the color, cloudiness, smell and even the taste of urine was used to identify a variety of diseases. Throughout the Byzantine era and well into the Middle Ages, urine color wheels (a diagram that linked the color of urine to a particular disease) were commonly used by physicians to make diagnoses [21], [25]. A brownish color would indicate jaundice, a red hue (blood) might indicate urinary tract tumors, absence of color would be indicative of diabetes and foamy urine would indicate proteinuria. With the advent of modern clinical techniques in the middle of the 19th century, uroscopy largely disappeared. However, urine has continued to be an important cornerstone to modern medical practice. In fact, it was the first biofluid to be used to clinically diagnose a human genetic disease - alkaptonuria [26]. Even today urine analysis is routinely performed with dipstick tests that can readily measure urinary glucose, bilirubin, ketone bodies, nitrates, leukocyte esterase, specific gravity, hemoglobin, urobilinogen and protein. More detailed urinalysis can be also used to study a variety of renal conditions, such as bladder, ovarian and kidney diseases [27], [28], [29], [30].

A comprehensive list of methods used to analyze urine and the numbers of metabolites identified and/or quantified by these methods (along with references) is provided in Table 1. As seen from this table, it is possible to (tentatively) identify up to 294 different metabolites in human urine. However, quantification is somewhat more difficult, with the largest number of quantified metabolites ever reported in human urine being slightly less than 100. In addition to these global metabolomic studies, hundreds of other “targeted” or single-metabolite studies have been conducted on human urine that have led to the identification and quantification of hundreds of other urine metabolites. Unfortunately, this information is not located in any central repository. Instead it is highly dispersed across numerous textbooks and periodicals [16].

To facilitate future research into urine chemistry and urine metabolomics, we believe it is critical to establish a comprehensive, electronically accessible database of the detectable metabolites in human urine. This document describes just such a database, one that contains the metabolites that can, with today’s technology, be detected in human urine along with their respective concentrations and disease associations. This resource was assembled using a combination of both our own experimental and literature-based research. Experimentally, we used high-resolution NMR spectroscopy, gas chromatography mass spectrometry (GC-MS), direct flow injection tandem mass spectrometry (DFI/LC-MS/MS), inductively coupled plasma mass spectrometry (ICP-MS) and high performance liquid chromatography (HPLC) with ultraviolet (UV) or fluorescence detection (FD) techniques performed on multiple human urine samples to identify 445 metabolites or metabolite species and quantify 378 of these compounds. To complement these “global” metabolic profiling efforts, our team also surveyed and extracted metabolite and disease-association data from more than 3000 books and journal articles that had been identified through computer-aided literature and in-house developed text-mining software. This “bibliomic” effort yielded data for another 2206 metabolites. The resulting Urine Metabolome Database (UMDB - http://www.urinemetabolome.ca) is a comprehensive, web-accessible resource containing a total of 2651 confirmed urine metabolites or metabolite species (corresponding to 3079 defined structures), their corresponding concentrations and their disease associations that were revealed or identified from these combined experimental and literature mining efforts.

Results and Discussion

The Content of the Human Urine Metabolome – The Urine Metabolome Database

The Urine Metabolome Database (UMDB: http://www.urinemetabolome.ca) contains a complete list of all (to the best of our knowledge) possible metabolites that have been detected in human urine using current technologies. The UMDB is freely available, easily queried, web-enabled database which provides a list of the metabolite names, level of verification (confirmed or probable), normal and disease-associated concentration ranges, associated diseases and corresponding literature references for all human urine metabolites that have ever been detected and/or quantified in the literature. The UMDB also contains concentration data compiled from the experimental studies described here. Each urine metabolite entry in this database is linked to a MetaboCard button [15], [16] that, when clicked, brings up detailed information about that particular entry. This detailed information includes nomenclature, chemical, clinical and molecular/biochemical data. Each MetaboCard entry contains up to 120 data fields many of which are hyperlinked to other databases (KEGG [45], PubChem [46], MetaCyc [47], ChEBI [48], PDB [49], UniProt [50], and GenBank [51] as well as to GeneCard IDs [52], GeneAtlas IDs [53] and HGNC IDs [54] for each of the corresponding enzymes or proteins known to act on that metabolite). Additionally, the UMDB through its MetaboCard/HMDB links includes nearly 450 hand-drawn, zoomable and fully hyperlinked human metabolic pathway maps (SMPDB: http://www.smpdb.ca/). These maps are intended to help users visualize the chemical structures on metabolic pathways and to get detailed information about metabolic processes [55]. These UMDB pathway maps are quite specific to human metabolism and explicitly show the subcellular compartments where specific reactions are known to take place.

The UMDB’s simple text query (TextQuery) supports general text queries including names, synonyms, conditions and disorders. Clicking on the Browse button (on the UMDB navigation panel) generates a tabular view that allows users to casually scroll through the database or re-sort its contents by compound name or by concentration. Users can choose either the “Metabolite View”, “Concentration View” or “Diseases View” to facilitate their browsing or searching. Clicking on a given MetaboCard button brings up the full data content (from the HMDB) for the corresponding metabolite. Users may also search the database using a variety of options listed uner the “Search” menu. For instance, the ChemQuery button allows users to draw or write (using a SMILES string) a chemical compound to search the UMDB for chemicals similar or identical to the query compound. ChemQuery also supports chemical formula and molecular weight searches. The Sequence Search button allows users to conduct BLAST sequence searches of the 4075 protein sequences contained in the UMDB. Both single and multiple sequence BLAST queries are supported. “Advanced Search” which is also located under the “Search” menu is the most sophisticated search tool in the UMDB and opens an easy-to-use query search tool that allows users to select or search over various combinations of subfields. The UMDB’s “MS Search” allows users to submit mass spectral peak lists that will be searched against the Human Metabolome Database (HMDB)’s library of MS/MS spectra. This potentially allows facile identification of urine metabolites from mixtures via MS/MS spectroscopy. UMDB’s NMR Search allows users to submit peak lists from 1H or 13C NMR spectra (both pure and mixtures) and have these peak lists compared to the NMR libraries contained in the HMDB. This allows the identification of metabolites from mixtures via NMR spectral data. The Download button provides links to collected sequence, image and text files associated with the UMDB. In the About menu, the “Data Fields Explained” button lists source data used to assemble the UMDB.

Currently the UMDB contains information on 2651 detectable metabolites or metabolite species (which corresponds to 3079 metabolites with precisely defined structures) and 3832 concentration ranges or values associated with 220 different conditions, diseases and disorders. The number of metabolites in the UMDB is not a number that will remain unchanged. Rather it reflects the total number of metabolites – most of which are endogenous - that have ever been detected and/or quantified by ourselves and others. Certainly as technology improves, we anticipate this number will increase as other, lower abundance, metabolites are detected and added to future versions of the UMDB. Likewise, if the list was expanded to include intermittent, exogenous compounds such as all possible drugs or drug metabolites or rare food additives and food-derived phytochemicals, the database could be substantially larger.

Inspection of the on-line tables in UMDB generally shows that human urine contains a substantial number of hydrophilic molecules. This is further reiterated in Table 2, which provides a listing of the chemical “superclasses” (using the HMDB definitions) in human urine and the number of representative compounds that can be found in this biofluid. Excluding lipids (which are in very low concentration), human urine is dominated by amino acids and derivatives, carbohydrates and carbohydrate conjugates. This simply reinforces the fact that urine is a key carrier of hydrophilic waste products. Other small molecule components found in high abundance in urine include hydroxy acids and derivatives (such as citric acid), urea, ammonia, creatinine and hippuric acid. A more detailed description of both our literature and experimental findings is given in the following 7 sections covering: 1) Literature Review/Text Mining; 2) NMR; 3) DFI/LC-MS/MS; 4) GC-MS; 5) ICP-MS; 6) HPLC/UV and 7) HPLC/FD.

Metabolite Concentration in Urine – Literature Survey

In addition to the experimentally derived values obtained for this study, the urine metabolome database (UMDB) also presents literature-derived concentrations of urine metabolites with references to either PubMed IDs or clinical textbooks. In many cases, multiple concentration values are given for “normal” conditions. This is done to provide users/readers with a better estimate of the potential concentration variations that different technologies or laboratories may measure. As a general rule, there is good agreement between most laboratories and methods. However, the literature results presented in the UMDB do not reflect the true state of the raw literature. A number of literature-derived concentration values were eliminated through the curation process after being deemed mistaken, disproven (by subsequent published studies), mis-typed or physiologically impossible. Much of the curation process involved having multiple curators carefully reading and re-reading the primary literature to check for unit type, unit conversion and typographical inconsistencies.

One point that is particularly interesting is the fact that the concentration (scaled to creatinine) of the average metabolite in normal urine varies by about ± 50%, with some metabolites varying by as much as ± 350% (such as normetanephrine (0.00085 ± 0.00317 µM/mM creatinine), pipecolic acid (0.03 ± 0.07 µM/mM creatinine), enterodiol (0.032 ± 0.072 µM/mM creatinine), tungsten (0.010 ± 0.022 µM/mM creatinine) and chlorogenic acid (0.0014 ± 0.0029 µM/mM creatinine). Therefore, drawing conclusions about potential disease biomarkers without properly taking into account this variation would be ill-advised. We believe that these relatively large metabolite concentration ranges are due to a number of factors, including age, gender, genetic background, diurnal variation, health status, activity level and diet [56], [57], [58], [59]. Indeed, some UMDB entries explicitly show such variations based on the populations (age, gender) from which these metabolite concentrations were derived. Clearly more study on the contributions to the observed variations in urine is warranted, although with thousands of metabolites to measure for dozens of conditions, these studies will obviously require significant technical and human resources.

A representative high-resolution NMR spectrum of urine from a healthy individual is shown in Figure 1. As can be readily seen from this figure, urine NMR spectra are very information-rich and surprisingly complex, with thousands of resolved peaks. From the 22 healthy control urine samples analyzed, we could identify a total of 209 unique compounds with an average of 167 ± 19 compounds being identified per sample. Every compound was unequivocally identified and quantified using spectral fitting (via Chenomx) and/or spike-in experiments with authentic standards. The concentration of each metabolite was normalized to each urine sample’s corresponding creatinine value to compensate for variations in urine volume (the concentration of metabolites is expressed as µM/mM creatinine).

The 209 compounds identified and quantified from these NMR studies represent a “high-water” mark for NMR-based metabolomics. Previous studies have reported up to 70 compounds being identified and/or quantified in human urine [40]. Indeed, compared to other platforms previously used to analyze human urine (Table 1), it appears that NMR may currently be the most comprehensive and certainly the most quantitative approach to characterizing this biofluid. Based on the fitted area under each urinary NMR spectrum and the number of unidentified peaks we estimate that more than 96% of the spectral area and more than 92% of all NMR-detectable compounds in our human urine samples are listed in Table 3. In other words, for NMR-based metabolomics, human urine is essentially “solved”. The same “solved” status has already been achieved for human serum (with 49 definitive compounds, [20]) and for human cerebrospinal fluid (with 53 definitive compounds, [17]). Knowing the expected or detectable composition of these biofluids should open the door to automated NMR-based metabolomics [60].

However, not all of the NMR-derived urine concentrations agree with literature-derived values. A total of 34 compounds had average concentrations somewhat higher (>1 SD) than previously reported values (for example, 1, 3-dimethyluric acid, glucaric acid, L-aspartic acid, mannitol), while 21 compounds had average concentrations lower (<1 SD) than previously reported (for example, 1-methylhistidine, phenol, dihydrothymine, thymidine). Another 7 metabolites also exhibited somewhat greater range in concentrations than those previously reported in the literature. These included: acetic acid, butyric acid, isovaleric acid, lactose, phenylglyoxylic acid, proline-betaine and trans-ferulic acid. Some of these discrepancies are likely due to differences in diet, physiological status, pharmacological effects and the age of the different cohorts that were analyzed. Other differences may be due to storage effects, sample preparation methods and the analytical methods being used. As a rule, NMR concentration determinations are very accurate since they involve direct measurement of the compound, as opposed to an indirect measurement of a derivative compound. Therefore we are quite confident in the NMR concentration values reported in Table 3 and would tend to view these as more reliable than those measured via other technologies.

A number of the compounds exhibiting higher-than or lower-than-reported concentrations appear to be associated with dietary intake. For example, mannitol is a sugar alcohol that is poorly absorbed by humans but its presence in urine can be explained by its occurrence in commonly consumed foods such as apples, pineapples, asparagus and carrots. Likewise, the urinary excretion of trans-ferulic acid (a polyphenolic derivative) increases after the ingestion of breakfast cereals [61] and chocolate. The relatively low value we measured is likely due to the fact that the literature value of trans-ferulic acid reported in Table 3 was measured for people on a special diet [62]. Also, the low level of proline-betaine we measured in urine may be due to a lower frequency of exposure to dietary citrus fruits in our population sample [63]. Proline-betaine is an osmoprotectant found in citrus fruit and urinary excretion of this metabolite is increased after consumption of fruits such as orange juice [64]. Similarly the higher levels of dimethyl sulfone we detected in urine could be attributed to dietary sources that contain DMSO [65]. For example, onions contain many sulfoxides including DMSO which can be oxidized in the liver and kidneys to produce dimethyl sulfone [66], [67]. The consumption of meat could also significantly increase the concentration of some metabolites in urine as reported in the case of 1-methylhistidine [68]. 1-methylhistidine is produced from the metabolism of anserine (a dipeptide) which is commonly found in meats [69], [70]. In addition to diet, metabolite levels in urine can also be affected by physiological status. For instance, the level of 3-hydroxybutyric acid in urine increases during fasting and can range from 0 to 200 µM/mM creatinine, with the maximum level reported in the literature (200 µM/mM creatinine) corresponding to healthy male after 35 h of fasting [50].

Some of the metabolites we measured by NMR did not have any previously reported literature values. For example, glucuronic acid is usually reported as total glucuronic acid (the free acid plus glucuronide conjugates) after hydrolysis [71], [72]. Here we report the concentration of free glucuronic acid, as indicated in Table 3. Another example, 2-methylerythritol, was previously detected in human urine but no concentration was reported [73]. The urinary excretion of 2-methylerythritol is most likely a result of dietary consumption of fruits or vegetables containing 2-methylerythritol and/or 2-methylerythritol-4-phosphate. 2-methylerythritol-4-phosphate is an intermediate in isoprenoid biosynthesis [74] and has been found to be quite abundant in certain plants [75].

A number of compounds we measured by NMR appear to be normal constituents of human urine but seem not to have been previously reported as being detectable by NMR (a total of 42 compounds) or reported as detected but not-quantified by any other method (a total of 8 compounds). The identification of these “NMR-novel” compounds was aided by their prior identification by GC-MS and DI-MS (see following sections) and through a careful literature analysis of compounds that had previously been detected in human urine via other methods. The list of detected but not previously quantified by NMR compounds includes: 2-hydroxy-3-methylpentanoic acid, 2-methyl-3-ketovaleric acid, 2-methylerythritol, glucuronic acid, monomethyl glutaric acid, N-methylhydantoin, phosphorylcholine and scyllitol. All of these compounds were confirmed using authentic standards.

This NMR study also revealed a number of common identification errors made in previously published NMR-based human urine metabolomic studies. In particular, several earlier reports identified phenylacetylglycine [76], N-acetylglutamic acid [77], cresol [78], isonicotinic acid [78], yellow 7.1 [79], meta-hydroxyphenylpropionic acid [80], 2-oxoisocaproic acid [81], urocanic acid, glycylproline and ornithine [80] as being detectable by NMR in human urine. Using our NMR instrument and the samples available to us, we were unable to detect any of these compounds, even after performing multiple spike-in experiments using authentic compounds. While some of these metabolites have been previously reported to be in human urine, they were reported at concentrations far below the lower limit of detection of modern NMR instruments (which is ∼ 1 µM). Due to their chemical shift similarity, phenylacetylglycine (which is found only in rats and mice) and N-acetylglutamic acid appear to be commonly mistaken for phenylacetylglutamine. We also noticed that, isonicotinic acid (a breakdown product of isoniazid and hydrazine derivatives, which is found only in individuals that have taken isoniazid and other hydrazine derivatives as a drug) appears to be mistaken for trigonelline. Likewise cresol (water-insoluble) appears to be frequently mistaken for cresol-sulfate (water-soluble), while the compounds yellow 7.1, meta-hydroxyphenylpropionic acid and 3-(p-hydroxyphenyl)-propionic acid appear to be commonly mistaken for 3-(3-hydroxyphenyl)-3-hydroxypropanoic acid (HPHPA).

In addition to correcting these compound identification errors, we also observed some significant gender-related effects on creatinine levels in our urine samples. Since males generally have a greater mass of skeletal muscle than females, they tend to have higher urinary levels of creatinine than women. This was clearly evident in our samples as the average male creatinine level was 20 mM while the average female creatinine level was 11 mM. In addition, increased dietary intake of creatine or a protein-rich diet can increase daily creatinine excretion [82].

Quantification and Identification of Urine Metabolites – GC-MS

As seen in Table 1, GC-MS methods have long been used to comprehensively characterize the chemical content of human urine. For our studies a total of 4 different GC-MS analyses were performed. The first method employed polar solvent extraction and derivatization to achieve broad metabolite coverage of polar metabolites, the second was more selective and targeted organic acids, the third targeted volatiles, while the fourth targeted bile acids. Representative high-resolution GC-MS total ion chromatograms are shown in Figures 2–4 for each of these analyses (except for the bile acids). Combined, the 4 GC-MS methods allowed us to identify 179 and quantify a total of 85 compounds. Table 4 shows the identified polar, organic acid extracts and bile acids (127 in total), Table 5 shows the identified volatile metabolites (52 in total) while Table 6 shows the 85 fully quantified compounds from all 4 techniques. These numbers actually represent the highest number of urine metabolites both identified and quantified by GC-MS to date. As seen in Table 1, previous GC-MS studies have reported up to 258 unique compounds being identified (but none quantified) [83] and approximately 95 compounds quantified in human urine [84]. Relative to NMR (see previous section) and other methods used to analyze human urine (Table 1), it appears that a multi-pronged GC-MS analysis is an excellent approach to characterize this biofluid. However, unlike NMR where nearly all detectable peaks are identifiable, metabolite coverage by GC-MS tends to be relatively incomplete. As seen in Figure 2, only 60% of the peaks could be identified using as reference the 2008 NIST library and other home-made GC-MS metabolite libraries. Likewise, in Figure 3, we see that only 65% of the organic acid peaks could be identified while in Figure 4, just 60% of the volatile compound peaks could be identified.

Incomplete compound identification is a common problem with global or untargeted GC-MS metabolomics. This may be due to any number of factors including spectral overlap due to incomplete separation, poor signal to noise for low intensity peaks, the lack of reference GC-MS spectral data for certain metabolites (especially unusual dietary sources), or the presence of spectral artefacts such as derivatization by-products or degraded metabolites in the GC-MS spectrum. For our GC-MS studies we used the NIST library supplemented with a home-made GC-MS reference library of known urine compounds assembled from the Human Metabolome Library [16]. No doubt the use of other commercially available reference GC-MS libraries such as the Fiehn GC-MS library from Agilent or the GOLM metabolome database library [85] might have allowed us to further increase our coverage. Likewise the use of a faster scan rate and/or a more sensitive GC-TOF instrument (instead of a slower scanning quadrupole GC-MS) certainly would have increased overall coverage.

Nearly all of the non-volatile metabolites (87) identified by our GC-MS analyses were also identified by NMR. Some of the exceptions were oxalic acid, phosphate and uric acid, each of which was detected by GC-MS but not by NMR. These compounds do not have NMR-detectable protons at physiological pH, making them essentially “NMR invisible”. Other compounds seen by GC-MS but not by NMR included metabolites that were generally below the detection limit of NMR (∼2 µM/mM creatinine) such as indolelactic acid and 2,4-dihydroxybutanoic acid. For our non-targeted GC-MS analysis, the lower limit of detection was 1 µM/mM creatinine (for 2,4-dihydroxybutanoic acid), while for our targeted organic acid GC-MS analysis the lower limit of detection was 0.7 µM/mM creatinine (for m-chlorobenzoic acid). Overall, our data suggests that the sensitivity of a standard single quadrupole GC-MS instrument is perhaps 1.5–2X better than a 500 MHz NMR instrument for water-soluble metabolites. It is also important to note that the level of water-soluble, non-volatile metabolite coverage obtained by GC-MS is not as great as seen with NMR (127 cmpds vs. 209 cmpds). The limited coverage of GC-MS is partly due to the fact that not all compounds can be readily extracted, easily derivatized or routinely separated on a GC column. Furthermore, when analyzing urine by GC-MS there is a need to pretreat the sample with urease (to reduce urea levels) that can diminish the abundance of some metabolites [86]. While GC-MS may not be the best method for analyzing water-soluble metabolites, it certainly excels at the detection of volatile metabolites. Indeed, only one of the volatile metabolites identified by GC-MS is identified by NMR (phenol). This certainly underlines a key strength of GC-MS relative to other metabolomics platforms. When comparing NMR to GC-MS we found that NMR is capable of detecting 121 compounds that the 4 combined GC-MS methods cannot detect while the combined GC-MS methods can detect 91 compounds that NMR cannot routinely detect. Overall, these data suggest that GC-MS and NMR appear to be complementary methods for the identification and quantification of small molecules in urine.

Among the 58 metabolites quantified by both GC-MS and NMR we found very good overall agreement, with the majority of measured concentration values falling within 20 ± 11% of each other. The concentration patterns and rankings of the most abundant to the least abundant compounds were also largely identical for the two platforms. A total of 12 metabolites exhibited somewhat larger concentration discrepancies between GC-MS and NMR (i.e; L-arabinose, L-serine (lower in GC-MS vs. NMR), 4-hydroxybenzoic acid and tyrosine (higher in GC-MS vs NMR). Some of these concentration differences may be due to the extraction or derivatization process needed to conduct GC-MS analyses. This can lead to unspecified compound losses, unusual derivatives or unrecognized fragmentation patterns. Therefore we would have expected at least a few GC-MS concentration values to be slightly lower than those seen by NMR. Likewise, it is important to remember that there are inherent errors (5–10%) in measuring peak areas (i.e. compound concentrations) both in GC-MS and NMR due to peak overlap, uneven baselines and spectral noise.

Nearly all of the compounds we detected or quantified in human urine by GC-MS have been previously described or mentioned in the GC-MS literature. One compound (scyllitol), however, appears not to have been previously detected by GC-MS. The identification of this compound by our GC-MS method was aided by its prior identification by NMR (see previous section). Additionally, a careful literature analysis also indicated the scyllitol is a normal constituent of human urine and has previously been detected in human urine via other methods.

As we noted with our NMR studies earlier, there are a few previously reported GC-MS detectable metabolites in human urine that appear to be artefacts. These artefactual metabolites may arise from extractions with different solvents, pre-treatment with urease, and chemical derivatization. For example, Shoemaker et al [84], reported the presence of bisethane in human urine. We also detected bisethane, but it appears to be artefact of chemical derivatization and is not a urine metabolite.

Direct flow injection (DFI) MS/MS or DFI-MS/MS is another commonly used global metabolic profiling method [87]. When isotopic standards are used along with multiple reactions monitoring (MRM), it is also possible to perform targeted metabolomics with very accurate concentration measurements. For our urine studies, we employed a combined DFI/LC-MS/MS approach, based on the commercially available AbsoluteIDQ p180 Kit (BIOCRATES Life Sciences AG, Innsbruck). When applied to urine, we were able to identify and quantify a total of 127 metabolites or metabolite species, including 34 acylcarnitines, 21 amino acids, 15 biogenic amines, creatinine, hexose, 35 phospatidylcholines, 15 sphingomyelins and 5 lysophosphatidylcholines. The amino acids and biogenic amines are analyzed by an LC-MS/MS method, whereas all other metabolites are analyzed by DFI-MS/MS as indicated in Table 7. DFI-MS/MS identifies lipid species (as opposed to specific lipids) using their total acyl/alkyl chain content (i.e. PC (38∶4)) rather than their unique structure. Therefore each lipid species identified by the BIOCRATES kit typically corresponds to 5–10 possible unique lipid structures. Consequently, the total number of phosphatidylcholines, sphingolipids and lysophosphatidylcholines structures identified by this method was 458, 19 and 6, respectively. Therefore, combining these probable lipid structures (483 in total, based on the known fatty acid and lipid composition in human serum) with the other 72 confirmed non-lipid metabolites, the DFI-MS/MS method yields 555 confirmed and probable metabolites or metabolite structures. All of these compounds, along with their corresponding estimated concentrations have all been entered into the UMDB.

Our results show very good agreement with the previous studies conducted by BIOCRATES on human urine samples (Biocrates Application Note 1005-1). We found that the lower limit of quantification by DFI MS/MS based on the AbsoluteIDQ kit was 0.1 nM/mM creatinine for certain phosphatidylcholine species (i.e. PC aa C40∶3) and 0.1 nM/mM creatinine for certain sphingomyelin species (i.e. SM 26∶1). Comparison of our lipid results with literature data was difficult as relatively few papers report urine lipid concentration data. Indeed, the presence of lipids in urine is a little unexpected, but not entirely unreasonable. It is likely that urea, a well known chaotrope, facilitates the dissolution of small amounts of fatty acids and other lipid species in urine.

Many of the compounds we measured with this kit assay appear to be normal constituents of human urine but have not been previously reported (quantified and/or detected) in the scientific literature (with the exception of the BIOCRATES Application Note). In total, 53 compounds are being reported here for the first time as being normal constituents of human urine, while 68 compounds are being robustly quantified in human urine for the first time. The vast majority of these compounds are lipids.

Based on our results, the combined DFI/LC-MS/MS method detected 98 compounds or compound species that GC-MS and NMR methods could not detect, while GC-MS detected 161 compounds and NMR detected 181 compounds that DFI/LC-MS/MS could not detect. The 3 methods were able to detect a common set of 17 compounds including creatinine, L-glutamine, L-tryptophan, L-tyrosine and L-valine. Interestingly, the concentrations measured by DFI/LC-MS/MS, NMR and GC-MS (across the shared set of 17 compounds) showed generally good agreement (within 19 ± 7% of each other). The relatively small overlap, in terms of compound coverage, between the 3 platforms is a bit of a surprise and certainly serves to emphasize the tremendous chemical diversity that must exist in urine. Overall, by combining these 3 platforms, we were able to identify 295 and quantify 231 unique or non-overlapping metabolites or metabolite species. These data suggest that DFI/LC-MS/MS, GC-MS and NMR are highly complementary techniques for the identification and quantification of metabolites in human urine.

Quantification and Identification of Urinary Trace Metals – ICP-MS

To determine the trace elemental composition of urine, we used inductively coupled plasma mass spectrometry (ICP-MS). ICP-MS is widely considered to be one of the best techniques for the characterization of the trace element composition of biological samples [88], [89]. Indeed, none of the other methods (1H-NMR, GC-MS and DFI/LC-MS/MS) are suited for measuring trace element composition or concentrations. Our multi-elemental analysis of urine using ICP-MS provided quantitative results for a total of 40 metals or trace minerals (Table 8). Based on their frequency of occurrence and overall abundance, all 40 trace elements appear to be normal constituents of human urine. Of these, 2 have previously not been quantified for healthy adults.

As far as we are aware, this is the first multi-elemental study of urine that has been performed by ICP-MS. As seen in Table 8, there is generally good agreement between the values measured by ICP-MS and those previously reported in literature, with differences generally being less than 22 ± 10%. Larger differences are seen for gallium (Ga), lead (Pb), Neodymium (Nd), titanium (Ti) and vanadium (V), but these may be due to the effects of age, diet, local environment (minerals in local water) or diurnal variation. Alternately they may reflect real differences in the sensitivity or accuracy of the instruments being used. As a general rule, ICP-MS is considered as a gold standard for the identification and quantification of trace metals [18], so we would tend to place higher confidence in the values derived via ICP-MS over those measured by other technologies. By our measurements, the most abundant metals/salts are sodium (Na) (12.5 ± 10.6 mM/mM creatinine) and potassium (K) (3.6 ± 2.5 mM/mM creatinine) – as expected, while the least abundant was rhenium (Re) with a lower limit of quantification by ICP-MS of 96 pM/mM creatinine.

The inventory of metabolites we detected and quantified by 1H NMR, GC-MS, DFI/LC-MS/MS and ICP-MS covers a significant portion of all chemical classes. However, these methods sometimes lack the necessary sensitivity, the appropriate instrumental configuration or detection capabilities and therefore fail to detect/quantify a variety of important compound classes. This includes a number of molecules that are normal constituents of urine such as thiols and isoflavones. To identify and quantify these 2 classes of metabolites we decided to employ High Performance Liquid Chromatography (HPLC). HPLC assays are the method of choice for detecting isoflavones and thiols as they are sensitive, precise and can be easily coupled with sensitive detection methodologies such as fluorescence and ultraviolet detection. In our studies, fluorescence and ultraviolet detection were used for the identification and quantification of urinary thiols and isoflavones, respectively.

Biological thiols, or mercaptans, are very active metabolic products of sulfur and play a central role in redox metabolism, cellular homeostasis and a variety of physiological and pathological processes. In urine, the most important thiols are L-cysteine and L-cysteinylglycine [90], [91], [92], [93]. Isoflavones or phytoestrogens form or constitute another important class of urinary metabolites [90]. Humans are exposed to these biologically active phytochemicals mainly through food intake via vegetables, fruit and wheat/bread products [94]. For the detection of thiols we developed assays to measure L-cysteine, L-cysteineglycine, L-glutathione and L-homocysyeine, while for isoflavones we developed assays to measure biochanin A, coumesterol, daidzein, equol, formonentin and genistein. Using these HPLC assays, we measured a total of 4 thiols and 6 isoflavones in urine (Table 9).

As seen in Table 9, there is generally good agreement between the values measured by these targeted HPLC assays and those previously reported in literature, with differences generally being less than ∼30%. Only one of these metabolites (L-cysteine) was measured independently on one of our other platforms (NMR) and the NMR concentration was found to be 25% lower than the HPLC assay. A possible explanation for this discrepancy is that cysteine measured by HPLC-FD yields total L-cysteine including the free form and L-cystine reduced to L-cysteine during the reaction [95]. On the other hand, NMR can distinguish between L-cysteine and L-cystine.

The Composition of Human Urine – Comparison with Other Biofluids

By combining a systematic computer-aided literature survey with an extensive, quantitative multiplatform metabolomic analysis we have been able to comprehensively characterize the human urine metabolome. Our data suggests that there are at least 3079 detectable metabolites in human urine, of which 1350 have been quantified. At least 72 of these compounds are of microbial origin, 1453 are endogenous while 2282 are considered exogenous (note some compounds can be both exogenous and endogenous), coming from diet, drugs, cosmetics or environmental exposure. Using a chemical classification system developed for the HMDB [15] we found that human urinary metabolites fall into 230 different chemical classes (25 “super classes”, Table 2). Given that there are only 356 chemical classes in the entire human metabolome [16], this certainly demonstrates the enormous chemical diversity found in urine. As might be expected, most urinary metabolites are very hydrophilic, although there are clearly trace amounts of lipids and fatty acids that contribute a significant number of chemicals to the urinary metabolome (836 fatty acids and lipids). This is in rather striking contrast to the composition of serum [20] which is particularly rich in lipids (i.e. >17000 lipids and fatty acids). Relative to other biofluids such as CSF [18] or saliva [19], urine contains significantly more compounds (5–10X) and exhibits significantly more chemical diversity (2–3X). On the other hand, we know that every compound that is found in human urine is also found in human blood. In other words, the human urine metabolome is a subset of the human serum metabolome, both in terms of composition and chemical diversity [20]. However, more than 484 compounds we identified in urine (either experimentally or via literature review) were not previously reported to be in blood. The fact that so many compounds seem to be unique to urine likely has to do with the fact that the kidneys do an extraordinary job of concentrating certain metabolites from the blood. Consequently compounds that are far below the limit of detection in blood (using today’s instrumentation) are well above the detection limit in urine. In fact, concentration differences between the two biofluids sometimes exceed 1000X for certain compounds, such as histamine, androsterone, normetanephrine, testosterone 13, 14-dihydro-15-keto-PGE2, m-tyramine and aldosterone. So, while the number of water-soluble compounds in blood and urine may be almost identical, the concentrations of these compounds are often profoundly different. This difference, combined with the ability of the kidney to handle abnormally high or abnormally low concentrations of metabolites, makes urine a particularly useful biofluid for medical diagnostics. In fact, according to our data in the UMDB, urinary metabolites have been used to characterize nearly 220 diseases. Furthermore, the ability of the kidneys to filter toxins or xenobiotics makes urine a particularly useful biofluid for diet and drug monitoring and for assessing chemical or pollutant exposure [96].

Method Comparison

One of the central motivations behind this work was to ascertain the strengths and weaknesses of several common metabolomic platforms for characterizing human urine. We employed 6 different analytical platforms: NMR; GC-MS; DFI/LC-MS/MS; HPLC/UV; HPLC/FD and ICP-MS. Using our literature-derived knowledge about the composition of human urine, along with custom-derived spectral libraries and targeted assays we were able to “push the limits” in terms of number of compounds that could be identified and/or quantified via each platform. In total, we identified 445 and quantified 378 distinct metabolites using these 6 different systems. According to Table 1, this is the largest number of urine metabolites ever identified and/or quantified in a single study. NMR spectroscopy was able to identify and quantify 209 compounds; GC-MS was able to identify 179 and quantify 85 compounds; DFI/LC-MS/MS identified and quantified 127 compounds; ICP-MS identified and quantified 40 compounds; while customized HPLC assays (with UV or FD detection) identified and quantified 10 compounds. The number of urinary metabolites we identified/quantified for NMR, GC-MS, DFI/LC-MS/MS and ICP-MS all represent “records” for these platforms.

In terms of platform overlap and compound complementarity, NMR and GC-MS were able to identify a common set of 88 metabolites; NMR and DFI/LC-MS/MS were able to identify and quantify a common set of 28 metabolites, while NMR, GC-MS and DFI/LC-MS/MS were able to identify a common set of 17 metabolites (15 amino acids, creatinine and hexose/glucose). All of these results are summarized in a Venn diagram (Figure 5). As might be expected, metabolite coverage differs from one analytical technique to another. These are difference mostly due to the intrinsic nature of the devices or platforms used. In particular, significant differences exist between these platforms in terms of their sensitivity or separation and/or extraction efficiency. Likewise the use of targeted vs. non-targeted methods along with issues related to compound stability, solubility and volatility led to some significantly different platform-dependent results.

Given that the known, quantifiable urine metabolome consists of ∼2651 known metabolites and metabolite species (corresponding to 3079 distinct structures), we can calculate that NMR is able to measure ∼8% (209/2651) of the human urine metabolome; GC-MS is able to measure ∼7% (179/2651); DFI/LC-MS/MS is able to measure ∼5% (127/2651); ICP-MS/MS is able to measure ∼1.5% (40/2651); HPLC/UV is able to measure ∼0.2% (6/2651); while HPLC/FD is able to measure ∼0.15% (4/2651) of the urine metabolome. When combined, the 6 analytical techniques are able to cover >16% of the known urinary metabolome (>445/2651). If we re-evaluate this fraction in terms of total metabolite structures (corresponding to known and highly probable metabolites), the urine metabolome consists of 3079 compounds. From this total we can calculate that NMR is able to measure ∼7% (209/3079) of the human urine metabolome; GC-MS is able to measure ∼6% (179/3079); DFI/LC-MS/MS is able to measure ∼18% (555/3079); ICP-MS/MS is able to measure ∼1.3% (40/3079); HPLC/UV is able to measure ∼0.2% (6/3079); while HPLC/FD is able to measure ∼0.13% (4/3079) of the urine metabolome. When combined, the 6 analytical techniques are able to cover >28% of the known and probable or putative urinary metabolome (>873/3079). In terms of chemical class coverage, NMR detects compounds from 15 of the 25 major chemical superclasses in urine, GC-MS detects compounds from 14 of the 25, DFI/LC-MS/MS detects 6 of the 25, ICP-MS detects 1 of the 25 while the targeted methods for thiols and isoflavones detect just 1 of each.

From these data we can conclude that NMR is currently the best method for identifying and quantifying urinary compounds. Not only does it permit measurement of the largest number of metabolites (209) but it also yields the greatest chemical diversity. Furthermore, NMR is non-destructive so that the same sample can be subsequently re-used for GC-MS, LC-MS or ICP-MS analyses. The minimal sample preparation and relatively rapid data collection for NMR also make it much more appealing for urine metabolomics, although the spectral analysis can be quite slow (∼1–2 hours per sample). While GC-MS is a close second in terms of overall coverage (179 metabolites, 14 chemical superclasses), these numbers represents the result of 4 different analyses performed on 2 different GC-MS instruments. Many labs would not have these multiple configurations available or the resources to routinely run these types of analyses. Likewise each sample required many hours of preparation, sample collection and data analysis. In this regard, multi-platform GC-MS is definitely not a high-throughput metabolomics technique. Relative to NMR and GC-MS, DFI/LC-MS/MS also performs well, with 127 compounds being quantified. However, DFI/LC-MS/MS provides very limited chemical diversity (only 6 chemical superclasses). On the other hand, DFI/LC-MS/MS requires very little sample volume (10 µL) and it is a very low-cost, largely automated, high-throughput route for measuring metabolites. The other techniques (HPLC, ICP-MS) we employed in this study, while useful, do not come close to matching the coverage or diversity of NMR, GC-MS or DFI/LC-MS/MS.

While we certainly went to considerable lengths to use current or cutting edge technologies to characterize the urine metabolome, it is also important to note that there is always potential for future improvement. Using higher field (900–950 MHz) NMR instruments, employing newer model GC-MS instruments or more sensitive GC-TOF instruments, using more than 3 derivatization or extraction steps for our GC-MS analyses, employing the latest version of the NIST database or a larger collection of GC-MS databases, implementing more sophisticated or targeted detection and separation techniques, using various commercial immunodetection kits or employing the latest LC-MS/MS techniques coupled to FT-MS or orbitraps – all of these could have added to the quantity and diversity of metabolites detected or quantified. However, like many laboratories, our resources are somewhat limited. Furthermore, in this study we wanted to address the question of how well a cross-section of commonly accessible metabolomic methods or platforms could perform in identifying and quantifying metabolites in urine.

While being able to quantitatively compare metabolite coverage and chemical diversity amongst the major metabolomics platforms (NMR, GC-MS and DFI/LC-MS/MS) is important, it is also useful to compare their consistency or reproducibility in terms of metabolite quantification. In particular we decided to assess the 3 major platforms in terms of their ability to identify and quantify a common group of compounds, namely the amino acids. Overall we found that the measured concentrations are in relatively good agreement (Table 10). However, a few exceptions are evident. For example, the NMR and DFI/LC-MS/MS concentrations of glycine and serine are higher than the GC-MS values (note that glycine exhibits the highest concentration among urinary amino acids). For serine, after the silylation reaction using MSTFA, we obtained serine-2TMS (13.9 min) and serine-3TMS derivatives (16.6 min). The chromatographic peak corresponding to serine-2TMS is weak and overlaps slightly with the urea peak. This overlap and the corresponding difficulty in peak integration may explain the quantitation differences compared to other analytical assays. Neither L-glutamine nor L-glutamic acid could be accurately quantified or identified by GC-MS. In our case, the glutamine peak co-elutes with glycerol-3-phosphate. Other investigators have noted that glutamine can also be hydrolyzed and converted to glutamic acid [97], or to pyroglutamic acid during derivatization [95]. As a result, only pyroglutamic acid could be identified in our GC-MS assay. The identification of glutamic acid and pyroglutamic acid can be complicated [97], which explains our failure to identify glutamic acid by GC-MS. L-arginine could not be detected by GC-MS, because it is converted to ornithine during derivatization [98], which probably explains the slightly higher concentration of ornithine measured by GC-MS, compared to the one determined by DFI/LC-MS/MS. Finally, L-cysteine and L-cystine can only be identified and quantified by NMR and targeted HPLC/FD because the identification and quantification of theses metabolites is not possible with the Biocrates kit. With GC-MS it has often been noted that L-cystine can be converted to L-cysteine during derivatization, while L-cysteine might be oxidized to L-cystine during prolonged storage of the standard solution [95], confounding the identification of L-cystine and L-cysteine.

Conclusion

Using a combination of multiple experimental assays supplemented with an extensive computer–assisted literature survey we were able to identify a total of 2651 metabolites or metabolite species (corresponding to 3079 distinct structures) that can be or have been identified and/or quantified in human urine using today’s technology. This information, which includes both normal and abnormal (disease or exposure-associated) metabolites has been placed into a publicly accessible web-enabled database called the Urine Metabolome Database (UMDB). To assess the validity of the literature data and to further investigate the capabilities of existing metabolomics technologies we conducted a comprehensive, quantitative analysis of human urine from 22 healthy volunteers. A total of 6 different platforms: NMR, GC-MS, DFI/LC-MS/MS, ICP-MS, HPLC/UV and HPLC/FD were used in this analysis. From this experimental work we were able to identify a total of 445 and quantify 378 metabolites or metabolite species. This corresponds to 873 unique structures (identified) and 806 unique structures (quantified). A total of 53 compounds or compound species are being reported here for the first time as being normal constituents of human urine, while 77 compounds or compound species are being robustly quantified in human urine for the first time. All of the metabolites that we experimentally identified and/or quantified have been added to the UMDB. Based on the information in the UMDB, our experimentally acquired data corresponds to 16% of the human urine metabolome (or 28% if we include probable or putative metabolites).

Considering the level of coverage, the diversity of chemical species and the ease with which analyses can be performed, we have determined that NMR spectroscopy appears to be the method of choice for global or untargeted metabolomic analysis of urine. On the other hand, the kit-based combined DFI/LC-MS/MS methods appear to be optimal for a targeted metabolomic approach. Using a multi-pronged GC-MS approach for urine metabolomics appears to be very promising in terms of coverage, but is not ideal for high-throughput analyses. All methods used in this study appear to be quite complementary with relatively little compound overlap. This strongly suggests that if sufficient time and resources are available, multiple methods should be used in urine metabolomic studies.

If additional resources had been available, we would have liked to assess other technologies (GCxGC-MS, FT-MS, isotope labeled-LC-MS) and to compare the level of metabolite coverage and chemical diversity attainable with these methods. However, this study is not intended to be the “final” word on urine or urine metabolome. Rather, it should be viewed as a starting point for future studies and future improvements in this field. Indeed, our primary objective for undertaking these studies and compiling this data was to help advance the fields of quantitative metabolomics, especially with regard to clinically important biofluids such as urine. Experimentally, our data should serve as a useful benchmark from which to compare other technologies and to assess coming methodological improvements in human urine characterization. From a clinical standpoint, we think the information contained in the human urine metabolome database (UMDB) should provide metabolomic researchers as well as clinicians and clinical chemists with a convenient, centralized resource from which to learn more about human urine and its unique chemical constituents.

Methods

Ethics Statement

All samples were collected in accordance with the ethical guidelines mandated by the University of Alberta as approved by the University’s Health Research Ethics Board. All individuals were over 18 years of age. All were approached using approved ethical guidelines and those who agreed to participate in this study were required to sign consent forms. All participants provided written consent.

Collection of Urine Samples

Human urine samples (first pass, morning) were collected from 22 healthy adult volunteers (14 male, 8 female) in 120 mL sterile urine specimen cups. The average age of the volunteers was 30 (range 19–65) for females, and 32 (range 21–67) for males. Upon receipt (typically within 1 hour of collection), all samples were immediately treated with sodium azide to a final concentration of 2.5 mM. After centrifugation at 4000 rpm for 10 min to remove particulate matter, the urine samples were stored in 2 mL aliquots in falcon tube at −20°C until further use. Prior to each analysis, the samples were thawed at room temperature for 30 minutes and filtered for a second time via centrifugation.

NMR Compound Identification and Quantification

All 1H-NMR spectra were collected on a 500 MHz Inova (Varian Inc., Palo Alto, CA) spectrometer using the first transient of the tnnoesy-presaturation pulse sequence. The resulting 1H-NMR spectra were processed and analyzed using the Chenomx NMR Suite Professional software package version 7.0 (Chenomx Inc., Edmonton, AB), as previously described [17]. Additional NMR spectra for 39 compounds were added to the Chenomx Spectral Reference Library using the company’s recommended spectral acquisition and formatting protocols. Further details on the NMR sample preparation, NMR data acquisition and the customized spectral library are provided in Method S1.

GC-MS Compound Identification and Quantification

Twenty-two urine samples were extracted separately to obtain separate pools of polar, organic acid, bile acid (3 of the 22 urine samples were chosen for analysis and aliquots from these provided an additional “pooled normal” sample) and volatile metabolites using different protocols. The polar metabolites were extracted with cold HPLC grade methanol and double-distilled water after pretreatment with urease, followed by derivatization with MSTFA (N-Methyl-N-trifluoroacetamide) with 1% TMCS (trimethylchlorosilane). For organic acids, the ketoacids were converted first to methoxime derivatives, followed by derivatization with BSTFA (N,O-Bis(trimethylsilyl)trifluoroacetamide) after two successive extractions by ethyl acetate and diethyl ether. The bile acids were eluted with methanol through a SPE column (Bond Elute C18), followed by two different derivatization steps. First the bile acid extracts were esterified using 2% sulfuric acid in methanol, then after phase separation, the esterified bile acids were converted to the corresponding methyl ester-trimethylsilyl ether derivatives using MSTFA with 1% TMCS. Each set of extracted/derivatized metabolites (polar metabolites, organic acids, bile acids) was separated and analyzed using an Agilent 5890 Series II GC-MS operating in electron impact (EI) ionization mode.

The volatile compound extraction and analysis by GC-MS was far different from the other protocols. The pooled urine samples were acidified using HCl and transferred to a SPME (solid phase microextraction) vial (75 µm Carboxen/PDMS from Supelco), and then introducing the SPME fiber assembly into the GC-MS injector port according to procedures described elsewhere [99], [100]. The fibers were conditioned prior to use according to the manufacturer’s instructions by inserting them into the GC injector port. Further details on the extraction, derivatization, separation and GC-MS data analysis for the 4 separate groups of urine metabolites are provided in Method S2.

To assess the performance of direct flow injection DFI-MS/MS methods in urine metabolomics and to determine the concentration ranges of a number of metabolites not measurable by other methods, we used the commercially available Absolute-IDQ p180 Kit (BIOCRATES Life Sciences AG - Austria). The kit, in combination with an ABI 4000 Q-Trap (Applied Biosystems/MDS Sciex) mass spectrometer, can be used for the targeted identification and quantification of 187 different metabolites or metabolite species including amino acids, biogenic amines, creatinine, acylcarnitines, glycerophospholipids, sphingolipids and hexoses. This method involves derivatization and extraction of analytes from the biofluid of interest, along with selective mass spectrometric detection and quantification via multiple reactions monitoring (MRM). Isotope-labeled internal standards are integrated into the kit plate filter to facilitate metabolite quantification. Metabolite concentrations were expressed as ratios relative to creatinine to correct for dilution, assuming a constant rate creatinine excretion for each urine sample (see Method S3 for additional information).

Trace Element Analysis Using ICP-MS

Before trace elemental analysis by ICP-MS was performed, 22 urine samples were processed as described previously [101]. The concentrations of trace elements were determined on a Perkin-Elmer Sciex Elan 6000 quadrupole ICP-MS operating in a dual detector mode. Blank subtraction was applied after internal standard correction (see Method S4 for additional information). The accuracy of the ICP-MS analytical protocol was periodically evaluated via the analysis of certified reference standard materials (whole rock powders) BE-N and DR-N available from the SARM laboratory at the CRPG (Centre de Recherches Pétrographiques et Géologiques).

Characterization of Isoflavones from Urine

We processed 22 urine samples as described previously [102], [103]. The isoflavones were isolated and concentrated by solid-phase extraction (Bond Elut C18 column). The elutes were hydrolyzed enzymatically as the urinary isoflavones occur predominantly as glucuronate and sulfate conjugates. The analysis were performed on an Agilent 1100 HPLC system using NovaPak C18 reversed-phase column connected to Agilent G1315B diode array detector with signals scanned between 190 and 400 nm (see Method S5 for additional information).

Characterization of Thiols from Urine

To extract urinary thiols, we derivatized all 22 urine samples as described previously [104]. A mixture of reagent was used for the reduction and derivatization (with bromobimane) of thiols. The derivatized thiols were injected immediately into a hypersil-ODS HPLC Column connected to Agilent fluorometer operating at an excitation wavelength of 485 nm and emission wavelength of 510 nm (see Method S6 for additional information).

Acknowledgments

The authors wish to thank Kruti Chaudhary and Hetal Chaudhary for adding concentration values from the literature to the UMDB, Rolando Perez-Pineiro for the synthesis of HPHPA and Beomsoo Han for converting all figures to high resolution images.