This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY-NC) license (https://creativecommons.org/licenses/by-nc/4.0/). See http://ivyspring.com/terms for full terms and conditions.

Abstract

While insulin replacement therapy restores the health and prevents the onset of diabetic complications (DC) for many decades, some T1D patients have elevated hemoglobin A1c values suggesting poor glycemic control, a risk factor of DC. We surveyed the stool microbiome and urinary proteome of a cohort of 220 adolescents and children, half of which had lived with T1D for an average of 7 years and half of which were healthy siblings. Phylogenetic analysis of the 16S rRNA gene did not reveal significant differences in gut microbial alpha-diversity comparing the two cohorts. The urinary proteome of T1D patients revealed increased abundances of several lysosomal proteins that correlated with elevated HbA1c values. In silico protein network analysis linked such proteins to extracellular matrix components and the glycoprotein LRG1. LRG1 is a prominent inflammation and neovascularization biomarker. We hypothesize that these changes implicate aberrant glycation of macromolecules that alter lysosomal function and metabolism in renal tubular epithelial cells, cells that line part of the upper urinary tract.

Introduction

Type 1 Diabetes (T1D) is an autoimmune disease characterized by the loss of insulin-producing β-cells in the pancreatic islets in genetically susceptible individuals. The incidence of T1D is now twice as high among children as it was two decades ago, and 10 to 20 times more common than 100 years ago [1-5]. The prevalence of T1D in Americans under age 20 rose by 23 percent between 2001-2009 (www.jdrf.org; www.searchfordiabetes.org). A prevalent scientific hypothesis on the rising levels of childhood T1D is the 'hygiene hypothesis', which holds that lack of exposure to once-prevalent microbes results in autoimmune hypersensitivity [6]. Several studies in animal models and most recently studies of the microflora residing in the human gut have demonstrated a link between gut microbiota and energy homeostasis [7, 8]. Imbalance in this homeostasis triggered by environmental factors may lead to disturbances such as inflammation and ultimate manifestation of the autoimmune T1D. The evidence is emerging that environmental factors such as viral infections [9] and perturbations of physiologically normal gut microbiota play important roles in disease onset [10] [11, 12]. The gut functions as an immune organ to protect us from ingested, potentially pathogenic bacteria and the billions of microorganisms in it play a role in regulating the immune system [13]. It is now well established that resident gut microbiota regulates specific mucosal T helper cells crucial for the host defense and, when perturbed, play a role in the development of autoimmune diseases [14]. The interplay of the latter with genetic and non-genetic factors resulting from inflammation [15], suggests a more complex etiology driving T1D pathogenesis [16].

Animal studies have shown that the gut bacterial composition differs significantly at the time of diabetes onset compared to healthy controls [13] [17]. Studies in germ-free mice have linked the activity of the Toll-like receptor-MyD88 pathway, critical to innate immunity, to the onset of T1D [18]. The human gut microbiome may be altered at the time of T1D onset due to autoimmune processes and years after T1D onset due to suboptimal insulin treatment and frequent hypo- and/or hyperglycemic states in patients, thus contributing to adverse long-term health outcomes [19, 20]. Two human cohort studies associating gut microbial profiles with T1D have been performed. One study described a less diverse T1D autoimmune gut microbiome in infants [21]. Another study found that T1D patients with good glycemic control and high physical fitness have gut microbiota similar to healthy individuals [22]. Comprehensive studies using larger cohorts and matched genetic backgrounds for patients and controls could yield further insights into the relationship of gut microbiota with the onset or progression of T1D, for instance, a correlation of certain gut microbial genera with metabolic perturbations known to affect T1D patients. Such discoveries may initiate targeted investigations towards the use of probiotics. Physiological links between the abundance of the genus Bacteroides in the human intestine and an altered metabolism of small chain fatty acids, which are anaerobic bacterial fermentation products and mediators of inflammation, have indeed been discovered [23].

Metabolites and proteins can serve as surrogate biomarkers to assess the risk of disease onset and complications associated with T1D and type 2 diabetes years or decades after disease onset. Urinary albumin is a diagnostic biomarker used in the clinic to assess one such complication, diabetic nephropathy (DN) [24, 25]. HbA1c, a glycated form of hemoglobin, is a biomarker for glycemic control and considered predictive of long-term diabetic complications [26]. In accordance with the life span of erythrocytes, HbA1c levels allow plasma glucose concentration estimates over a three-month average. The aberrant hemoglobin glycation appears to be linked to frequent exposure of this protein and other cellular macromolecules to high glucose levels. While HbA1c is an early glycation product, advanced glycation end products (AGEs) form when diabetic complications become clinically evident [27]. The Diabetes Control and Complications Trial (DCCT) and the Epidemiology of Diabetes Interventions and Complications study (EDIC) assessed median HbA1c levels for an intensive therapy cohort treated with insulin over 6.5 years [28]. On average, the HbA1c levels were lower than in a cohort with standard insulin therapy. Once all T1D patients in that cohort had received intensive insulin therapy, differences in HbA1c levels declined over five years and were not discernable thereafter, even though the risk of diabetic complications in the DCCT cohort originally receiving intensive insulin therapy continued to be lower [28]. Metabolic memory has been discussed to describe the long-term nature of diabetic complications in the context of chronic hyperglycemia [29]. There is a need for improved biomarkers of the metabolic memory effects associated with diabetic complications.

Recently, we surveyed the urinary proteome for protein abundance changes comparing a cohort of 40 children and adolescents who were diagnosed with T1D, on average 6.3 years earlier, and a cohort of healthy siblings in the same age range [30]. While the T1D patients were asymptomatic for diabetic complications, quantitative protein differences may point to pathophysiological processes predictive of future disease. Our data revealed that 13 lysosomal enzymes were increased in abundance in the urine of the T1D cohort versus the control cohort [30], suggesting that more proteins from this subcellular organelle were released into the urinary tract from kidney filtration and reabsorption processes in T1D patients. This may or may not be linked to activation or damage of lysosomes in proximal renal cells. Lysosomes are subcellular organelles responsible for the catabolism of glycoproteins, glycosaminoglycans and glycolipids [31, 32]. Diabetes-associated aberrant glycation of such macromolecules is a reasonable explanation for the mobilization of lysosomal catabolic pathways in T1D patients. Indeed, lysosomal membrane permeabilization resulting in the cytoplasmic release of enzymes functionally active in lysosomes of renal tubular epithelial cells (TECs) was shown to be caused by DN-associated AGE formation [33]. The luminal surface of proximal TECs has densely packed microvilli allowing the cells to resorb and excrete proteins at a high capacity. The question arises as to whether T1D patients asymptomatic for DN develop lysosomes with compromised functions and membrane barriers linked to irreversible complications such as DN later in life. In that context, lysosomal enzymuria was described for patients in early stages of DN [34].

Here, we examined a larger cohort of young T1D patients and their healthy siblings using proteomics to verify previous results indicating moderately increased lysosomal protein excretion in the urine of T1D patients. We established protein-metabolite networks to support the hypothesis of lysosomal dysfunction. Finally, we surveyed the gut microbiota using 16S rDNA gene-based phylogenetic typing to assess gut microbial differences in T1D patients versus their healthy siblings.

Materials and Methods

Study Population and Specimen Collection

A case-control, prospective study was conducted in which typically one sibling of a sibling or twin pair developed T1D and the other sibling was healthy. In a few cases, several siblings were enrolled. Study subjects were recruited at Children's National Health System, Washington D.C., (CNHS) between 2011 and 2014. The study was approved by the Institutional Review Boards of CNHS and the J. Craig Venter Institute (JCVI). Participating human subjects or parents, if the participant was a child, provided informed consent under the IRB study no. Pro00001986. The criteria for inclusion comprised the range of 3-18 years of age and the American Diabetes Association's criteria for the diagnosis of T1D [35] with HbA1c levels of less than 15%. Subjects who reported antibiotic treatments within the three months prior to sampling were excluded due to the known impact of antibiotics on the composition of gut microbiota. All patients were treated with insulin, and their healthy sibling(s) served as controls. There was no evidence of renal function decline in any of the enrolled T1D patients. While the study did not allow gender and age matching of sibling pairs due to demographic limitations, the average age and frequency of each gender comparing disease and control groups were similar. Proteomics and metagenomics (omics) datasets ranging from 98 to 106 patients and their siblings were obtained. Information on gender, age, family medical history, date of T1D onset, Glutamic Acid Decarboxylase (GAD65) autoantibody test results, a one-time HbA1c value for subjects in the T1D cohort, occurrence of infections and antibiotic usage over three months prior to specimen collection were recorded (Supplementary Data, Table S1).

Stool and urine samples were obtained from every subject enrolled in the study and stored at -20°C at the clinical site for up to two weeks. A standard stool collection protocol developed by the Human Microbiome Project (HMP) initiative [36] was used in this study. Mid-stream urine specimens were provided with instructions for sample collection by the study coordinator when the human subjects visited the clinical site. Specimens were frozen immediately, kept frozen during transport and stored at -80°C until used for omics analyses.

Stool DNA Extraction

The DNA for 16S PCR was extracted from stool samples using a MoBIO powersoil purification kit. We aliquoted approximately 1 g of stool into a centrifuge tube and resuspended it in 800 µl of lysis buffer (1 M Tris-HCl, 2 mM EDTA, 1.2% Triton X-100). Stool samples were incubated at 75°C for 10 minutes, allowed to cool to room temperature, followed by the addition of 60 µl 200 mg/ml lysozyme and 5µl RNase A. Stool lysates were incubated overnight at 37°C, and then added to lysing tubes provided in the MoBIO powersoil purification kit. The DNA was extracted using the manufacturer's specifications and eluted in 100 µl of the solution [37].

16S rDNA Analysis by MiSeq Sequencing

DNA extracted from stool samples was amplified using primers that targeted the V1-V3 regions of the 16S rRNA gene [37]. These primers included the i5 and i7 adaptor sequences for Illumina MiSeq pyrosequencing as well as unique 8 bp indices incorporated onto both primers such that each sample receives its unique barcode pair. This method of incorporating the adaptors and index sequences onto the primers at the PCR stage provided minimal loss of sequence data when compared to previous methods that would ligate the adaptors to every amplicon after amplification. This method also allows generating sequence reads which are all in the same 5'-3' orientation. Using approximately 100 ng of extracted DNA, the amplicons were generated with Platinum Taq polymerase (Life Technologies, CA) and by using the following cycling conditions: 95°C for 5min for an initial denaturing step followed by 95°C for 30 sec, 55°C for 30 sec, 72°C for 30 sec for a total of 35 cycles followed by a final extension step of 72°C for 7 min then stored at 4°C. Once the PCR for each sample was completed, the amplicons were purified using the QIAquick PCR purification kit (Qiagen Valencia, CA), quantified fluorometrically using SYBR Gold Nucleic Acid Gel Stain (ThermoFisher Scientific), normalized, and pooled in preparation for bridge amplification followed by Illumina MiSEQ sequencing using the dual index 2x300 bp format (Roche, Branford, CT) following the manufacturer's protocol.

Processing and Filtering of Sequence Reads

Operational taxonomic units (OTUs) were generated de novo from raw Illumina sequence reads using the UPARSE pipeline [38]. Paired-end reads were trimmed of adapter sequences, barcodes and primers prior to assembly. Sequences of low quality and singletons were discarded. Sequences were subjected to a de-replication step and abundances were determined. Chimera filtering of the sequences occurred during the clustering step. We used the Wang classifier (method=wang) and bootstrapped using 100 iterations (iters=100). We set mothur to report full taxonomies only for sequences where 80 or more of the 100 iterations are the same (cutoff=80). Taxonomies were assigned to the OTUs with mothur [39] using version 123 of the SILVA 16S ribosomal RNA database [40] as the reference. Tables with OTUs and the corresponding taxonomy assignments were generated and used in subsequent analyses. The next step was to remove likely non-informative OTUs with an independent filtering process. Rare OTUs or taxa are strongly affected by MiSeq sequencing errors, and any statistical conclusions relying on them are typically unstable. Even in the univariate differential abundance analysis, the presence of such taxa increases the penalty from the multiple testing correction applied to the more abundant taxa. We used unbiased metadata-independent filtering at each level of the taxonomy by eliminating all features that did not pass these criteria. This included samples with less than 2000 reads and OTUs present in less than 10 samples.

Identification of Phylogenetic Groups in Gut Microbiota

The phyloseq package version 1.16.2 in R package version 3.2.3 was used for the microbiome census data analysis [41, 42]. The ordination analysis was performed using non-metric multidimensional scaling (NMDS) with the Bray-Curtis dissimilarity matrix [43]. The data output was used for the generation of a heatmap using the plot_heatmap function in the phyloseq package [41]. Differences in microbial richness (alpha diversity) were evaluated using different algorithms included in the phyloseq plot_richness function. For genus-level and OTU count matrices, we performed the following richness and diversity analyses using the R phyloseq package. The plot_richness function was used to create a plot of alpha diversity index estimates for each sample. Heatmaps of taxonomic profiles clustered hierarchically using the plot_heatmap function in phyloseq were constructed [44]. A beta diversity Bray-Curtis dissimilarity matrix was used to compute non-metric multidimensional scaling (NMDS). The ordination output was plotted in the form of heatmaps using the plot_heatmap function including a side bar where clinical variables associated with each sample were assigned to look for specific associations.

Shotgun Proteomics by LC-MS/MS Analysis

LC-MS/MS analysis was performed using an Ultimate 3000-nano LC system coupled to a Q-Exactive mass spectrometer (Thermo Scientific, USA). The LC-MS/MS experimental and data acquisition methods have been described in detail [46]. In brief, peptides were separated over 110 min gradient from 2% to 80% (over 100 min to 35% and over 10 min to 80%) in buffer B (0.1% formic acid in acetonitrile) at a flow rate of 200 nl/min. MS survey scans were acquired at a resolution of 70,000 over a mass range of m/z 250-1,800. In each cycle, the ten most intense ions were subjected to high-energy collisional dissociation (HCD) applying a normalized collision energy of 27%. The MS2 scans were performed at a resolution of 17,500. Typically, three replicates of LC-MS/MS experiments were run for a given sample. The human protein sequence database was downloaded from UniProtKB resource (version 2016_01; reviewed entries only, 26,050 sequences). Raw MS data were collected and analyzed with the Proteome Discoverer software tool (version 1.4, Thermo Scientific) to obtain protein identification data setting protein false discovery (FDR) rates at 1%. This method was used in initial analysis stages to ensure that raw datasets had sufficient proteome depth (≥ 150 protein IDs).

Quantification and Statistical Analysis of Urinary Proteomic Data

The MaxQuant software tool (version 1.5.1.0) [47] accepting most default settings was employed to identify and quantify proteins from raw MS data. Combining data files representing LC-MS/MS replicates for each sample, the database searches against the UniProt protein database mentioned above were conducted using the search engine Andromeda incorporated into the MaxQuant environment. Protein N-terminal acetylation, oxidation (M) and deamidation (NQ) were set as variable modifications, and cysteine carbamidomethylation was set as a fixed modification. Enzyme specificity was set to trypsin, and the minimal peptide length was set to seven amino acids. Proteins isoforms that could not be discriminated based on unique peptides were reported as a single protein group. The precursor and fragment ion masses were identified with an initial mass tolerance of 6 ppm and 20 ppm, respectively. The maximum false discovery rate (FDR) for both peptide and protein identifications was set to 1%. For label-free protein quantification, the MaxLFQ algorithm was used accepting a minimum number of two peptides for each quantified protein. Default settings for quantification based on MS1 peak integration and for normalization among all datasets were accepted. Proteins not present in at least 20% of the samples of either human subject group were removed. This was followed by the elimination of samples where quantitative information was available for less than 25% of the entire set of identified proteins. Missing values were imputed using the LSimpute algorithm with the LSimpute_adaptive option [48]. Histograms which are displayed for the Log2 transformed MaxLFQ values prior to and after imputation in the Supplementary Data, Dataset S2 show that data were more normalized after imputation. Therefore, data with imputations of missing values were the final input for statistical analyses.

Statistical Analyses for Gut Microbiome and Urinary Proteome Data

To detect differential abundances in the gut microbiota at a genus or species level the DESeq2 package version 1.12.3 in R was used. The phyloseq object are converted into a DESeq2 object using the function phyloseq_to_deseq2 function. DESeq2 [49] is a method for the differential analysis of count data that uses shrinkage estimation for dispersions and fold changes to improve the stability and interpretability of estimates. The DESeq2 test uses a negative binomial model rather than simple proportion-based normalization or rarefaction to control for different sequencing depths, which may increase the power and also lower the false positive detection rate [50]. Default options of DESeq2 were used for its multiple testing adjustment applying Benjamini & Hochberg [51].

The differential proteomic analysis was performed using the Limma software package, which is part of the R package and used for the analysis of expression data using a linear model which is fit to gene and protein expression data [52]. The linear model allows analyses of the entire experiment as an integrated whole, sharing information between samples. Limma accepts log-ratios or log-intensity values of the expressed proteins as input. Multiple testing adjustments were described in the previous paragraph. The p-value cutoff for the selection of significant proteins and OTUs is 0.05 after FDR adjustment for multiple comparisons.

Gene Ontology, Clustering and Classification Analyses

Gene ontology information of proteins was obtained using the ClueGO version 2.2.5 and CluePedia version 1.2.5 plugin in Cytoscape version 3.3.0 [53-55]. The biological process, cellular component, molecular function and immune system process ontology data were recent database updates (31.03.2016). The KEGG pathway data were from a 10.02.2016 update. For GO enrichment analysis, p-values of less than 0.05 were considered. All experimental (EXP_IDA_IPI_IMP_IGI_IEP) evidence codes were selected. The GO terms were connected by the proteins that shared those GO terms. Differentially expressed proteins and samples were clustered using the dist function in R, which calculates Euclidean distance matrix between different proteins and samples. The distance matrix is used to generate hierarchical cluster using the hclust function in R and converted into a heatmap with the heatmap function in R. Proteomes of the T1D and healthy sibling (HS) groups were classified using the Random Forest package version 4.6 in R [56]. We used 1000 trees in Random Forest for the classification of the T1D and control group proteome data.

Protein Network Analysis

For a comprehensive human protein-protein interaction analysis, a list of known binary protein-protein interactions were compiled from IntAct and PINA2 databases [57, 58]. The IntAct and PINA2 databases contain experimentally derived interaction information collected from multiple resources. We considered only the interactions between human proteins and removed those considered to be indirect associations or to have a high noise level: interactions obtained from light scattering, fluorescence microscopy, imagining, confocal microscopy, tandem affinity purification, luminescence-based mammalian interactome mapping, and affinity chromatography-tandem mass spectrometry methods. This data filtering step ensured that only high confidence interactions were considered for the generation of a protein-protein interaction network. The network analyses were performed using the software tool Cytoscape [55]. Protein-ligand interactions were obtained from the ChEBI database; only those ligands interacting with differentially abundant proteins were considered [59].

Validation of Protein-Protein Interactions

The interactions of FN1, SDC4 and HSA with LRG1 were examined using a blot overlay assay. We used 200 ng each of purified FN1 (Cat no. P4955, Abnova), SDC4 (Cat no: H00006385-P02, Abnova) and HSA. Denatured proteins were separated by SDS-PAGE and electroblotted onto a polyvinylidene difluoride membrane. Residual SDS bound to the membrane was removed by washing with 30 mM Tris-HCl (pH 7.4) with 0.05% Tween 20 for 15 minutes. The membrane was blocked with 2% nonfat milk overnight at 4˚C. To identify protein-protein interactions, it was incubated with purified LRG1 (8 nM), a protein prepared using an in vitro wheat germ expression system (Cat no: H00116844-O01, Abnova). Binding of blotted proteins to LRG1 was detected using an anti-LRG1 monoclonal antibody (Cat no: H00116844-M01, Abnova) using a standard Western blotting procedure and chemiluminescent detection [30].

Results and Discussion

Summary of Metadata for the T1D and Healthy Sibling Cohorts

The 16S rDNA data for the survey of gut microbiota were obtained from analyses of 205 stool samples consisting of 99 T1D patients and 106 healthy siblings (HS) as the control group. For the T1D cohort, the average age and duration since disease onset were 14.7 and 7.1 years, respectively. The averaged HbA1c value for the cohort was 8.4. Urinary proteomes were profiled from 104 T1D and 111 HS subjects. Even though the subjects, associated with 102 families, were not perfectly matched for both data types, the overlaps were extensive and average age and duration since T1D onset were unchanged (Table 1). Demographic and clinical data are summarized in Supplementary Data, Table S1. To our knowledge, none of the siblings originally enrolled as healthy controls were diagnosed with T1D or seroconverted to a GAD65 autoantibody positive status during the active enrollment phase. However, seven HS subjects were GAD65 autoantibody positive when they were recruited for this study. Limited information was available with respect to the alleles HLA DR3/4-HLADQ8, which are considered high T1D risk alleles. Two sibling pairs for whom 16S rDNA and proteomic data were available had the HLA DR3/4-DQ8 allele. No HS case revealed simultaneous presence of the HLA DR3/4-DQ8 allele and the GAD65 autoantibody positive status.

No Major Changes in Diversity of GI Microbiota Comparing T1D Patients and Healthy Siblings

We compared the GI microbiota derived from T1D and HS stool samples via analysis of the 16S rRNA marker gene. The average number of annotated sequences per sample was 63,688 (2,221,617, maximum; 50,561, median; 8,872, minimum). The UPARSE pipeline identified 1747 OTUs and, after removing OTUs not present in more than ten samples, the number of OTUs was reduced to 1237. These OTUs were used to calculate the alpha diversity of microbiota in T1D patients compared to HS controls using the Shannon index, the Simpson index, and the InvSimpson alpha diversity measures (Supplementary Data, Dataset S1). Wilcoxon rank sum tests returned p-values of 0.61, 0.35 and 0.35, respectively, suggesting the absence of statistically significant differences in alpha diversity. The 1237 OTUs were collapsed using the tax_glom function to 608 species-level OTUs, 280 genus-level OTUs, and 87 family level OTUs. Since not all OTUs were assigned to species and genera and termed unclassified, uncultured and unidentified, they were not used in differential analyses.

To determine whether the cohorts could be separated in an NMDS plot, we performed NMDS ordination using Bray-Curtis dissimilarity and obtained a 0.22 stress level using two dimensions at the genus level. 16S rRNA profiles showed no evidence of differences in the composition of GI microbiota comparing the T1D and HS cohorts. NMDS ordination performed at the family level resulted in a 0.19 stress level. Family level and phylum level data were used to generate a heat map based on the NMDS ordination. The phyla and class abundance profiles across all samples did not reveal differences comparing the two cohorts (Supplementary Data, Dataset S1). Performing classification of the data using the Random Forest method, a ROC value of 0.49 was further evidence that the composition of GI microbiota associated with the T1D cohort was not different from the HS cohort.

To determine differentially abundant OTUs comparing the T1D and HS subjects, we applied a negative binomial model implemented in DESeq2. The data in Table 2 reveal differential family, genus and species level abundances. OTU_35 is present in all T1D and HS samples with an average of 398 and 148 read counts, respectively, and a 1.24 Log2-fold increase in the T1D cohort with an adjusted p-value of less than 0.01. This OTU was assigned to Streptococcus salivarius subsp. thermophilus. This subspecies is a component of oral probiotic supplement VSL#3 [60]. In addition, VSL#3 contains Bifidobacterium and Lactobacillus species. The probiotic supplement was reported to reduce the incidence and delay the onset of T1D in a rodent model and NOD mouse [61]. Another study reported that VSL#3 alone or in combination with retinoic acid (RA) protected NOD mice from T1D onset [62]. Oral administration of VSL#3 in the murine model was reported to reduce insulitis and decrease the destruction rate of β-cells in the pancreas. This probiotic supplement was also linked to a decrease of ulcerative colitis in patients suffering from mild to moderately active ulcerative colitis [63]. Mindful that 16S rRNA profiles only provided rough abundance estimates for taxa, Streptococcus salivarius subsp. thermophiles contributes only 1% and 0.38% to the total GI microbiota of the T1D and HS cohorts, respectively. Another microbial species more abundant in the T1D group vs. the HS group is Eubacterium eligens (ATCC_27750). Few reads were assigned to this species making a differential physiological impact of the species on host interactions in T1D patients compared to healthy subjects unlikely.

Analysis at the family level suggests the depletion of Lactobacillaceae in the T1D cohort under study. Interestingly, Lactobacillaceae are quantitatively decreased in MyD88-deficient non-obese diabetic NOD mice compared to mice resistant to the development of T1D [64]. Analysis at the genus level indicates that Enterococci are strongly decreased in abundance in the T1D vs. HS cohort, and the most highly represented phylogenetic group showing differential abundance in the 16S rRNA survey. Considering the observed abundance changes in opposite directions and the close phylogenetic relationship between Streptococci and Enterococci we speculate that the bacteria compete for the same niche in the gut microbiome, with Enterococci less able to adapt to the higher sugar concentrations in the diabetic host. In addition to Enterococci (genus level), the species Clostridium butyricum was decreased comparing the high HbA1c group (n > 9.0) of T1D patients with the family-matched HS group. This change was not statistically significant comparing the T1D group with moderately elevated HbA1c values (6.5 < n < 9.0) and the HS cohort. The biological significance of changes in Enterococcus and S. salivarius abundances in the gut microbiome remains to be elucidated.

Table 2

Gut Microbiome Changes Comparing T1D and Healthy Sibling Cohorts.

Taxonomic Rank

HS* (% composition)

T1D* (% composition)

Log FC

p-value

A: T1D patients versus HS subjects

Phylum

Family

Firmicutes

Streptococcaceae

250 (0.64)

582 (1.49)

0.98

0.02

Firmicutes

Lactobacillaceae

354 (0.90)

28 (0.07)

-1.96

0.03

Phylum

Genus

Firmicutes

Incertae sedis

25 (0.06)

68 (0.17)

1.19

0.03

Genus

Species

Enterococcus

Unclassified

720 (1.84)

67 (0.17)

-2.95

0.00

Streptococcus

Streptococcus salivarius subsp. thermophiles

148 (0.38)

398 (1.02)

1.2

0.00

Incertae sedis

Eubacterium eligens ATCC 27750

14 (0.04)

55 (0.14)

1.5

0.01

Bacteroides

Unclassified

278 (0.71)

184 (0.47)

-1.45

0.02

Ruminococcaceae UCG-002

Uncultured bacterium

109 (0.28)

54 (0.14)

-1.34

0.02

Blautia

Uncultured organism

126 (0.32)

238 (0.61)

0.8

0.02

Streptococcus

Unclassified

62 (0.16)

124 (0.32)

0.99

0.02

Eubacterium coprostanoligenes group

Uncultured organism

67 (0.17)

4 (0.01)

-2.41

0.02

B: T1D patients with high HbA1c values versus HS cohort

Genus

Species

Enterococcus

Unclassified

2870 (7.30)

148 (0.36)

-4.7

0.00

Ruminococcaceae_UCG-014

uncultured_Ruminococcaceae_bacterium

0 (0)

24 (0.06)

4.8

0.01

Clostridium_sensu_stricto_1

Clostridium_butyricum

1635 (4.16)

639 (1.54)

3.8

0.03

*Average number of sequence reads; number in brackets indicates contribution of a phylogenetic group in %. Log FC: Log-fold change with positive and negative values indicating higher and lower abundances in the T1D cohort, respectively. The p-value is adjusted for multiple testing (Benjamini-Hochberg test).

Increased Abundance of Proteins with Roles in Lysosomal Catabolism and Inflammation in the T1D Cohort vs. the HS Cohort

LC-MS/MS analyses allowed us to reliably quantify 618 proteins from urine concentrates of 215 human subjects. Comparing the T1D cohort with the HS cohort, 37 and 20 proteins are increased and decreased in abundance, respectively (n > 1.3; adjusted p-value < 0.05). The proteins with the strongest abundance changes are listed in the Table 3, and additional proteins in the Supplementary Data, Dataset S2.

Several proteins with abundance increases in the T1D cohort, compared to the HS cohort, have functions in inflammatory processes. The protein on average increased the most in abundance in T1D patients compared to healthy siblings is leucine-rich alpha-2-glycoprotein (LRG1). Functions of LRG1 include positive regulation of angiogenesis and granulocyte differentiation and a role in transforming growth factor beta (TGF-β) receptor signaling [65, 66]. Another protein increased in the T1D cohort and characterized as a mediator of inflammation is CD14. CD14 is a prominent adhesion protein present on leukocyte cell surfaces and secreted into plasma. It interacts with the lipopolysaccharide (LPS)-binding protein which, in turn, binds to cell envelope LPSs derived from Gram-negative bacteria prior to engagement of Toll-like receptors [67, 68]. It is also a receptor for lipoteichoic acid and cell envelope proteins of Gram-positive bacteria such as Streptococcus sanguis, Streptococcus pneumoniae and Staphylococcus aureus [69, 70]. Finally, CD14 modulates adipose tissue inflammation and is involved in insulin resistance [71]. AZGP1 is a MHC class I glycoprotein that stimulates lipid mobilization with a putative role in insulin resistance [72, 73]. CPE is a carboxypeptidase that processes pro-insulin and other hormones and is highly expressed in the pancreas. Some of these inflammatory proteins are highly expressed in monocytes and renal tissue according to data from the human proteome map [74, 75]. Their expression levels in various cells and tissues are included in the Supplementary Data, Dataset S2.

Most differently abundant proteins had higher ratios when only the T1D cohort with HbA1c values > 9 was selected for comparison with healthy subjects. LRG1 is 3.5 times more abundant, and CTSD (2.4), (2.2), A1BG (2.2), IGHM (2.0), AZGP1 (2.0), CD14 (2.0) and NAGA (2.0) also revealed higher quantitative ratios compared to the HS cohort. Twenty-four proteins have a fold change greater than 1.5 for the high HbA1c value group, while only ten proteins have a fold change greater than 1.5 for the group with moderately raised HbA1c values (6.5 < n < 9), each compared to the HS cohort. Among the fourteen proteins increased more than 1.5-fold when the high HbA1c group was selected are ceruloplasmin (CP), the glycoprotein A1BG, lumican, prostatic acid phosphatase, the complement factor C4A and angiotensin-converting enzyme 2 (ACE2). Detailed data are presented in Supplementary Data, Dataset S2.

Regarding the presence in distinct subcellular fractions, the fact that eight of the 15 proteins increased the most in the T1D cohort compared to the HS cohort were lysosomal is striking and verified our previous findings for a cohort consisting of 40 T1D and 41 HS subjects [30]. GM2A is a lysosomal activator for the breakdown of ganglioside GM2 and other molecules containing terminal N-acetyl hexosamines [76]. Enhanced activity of the lysosomal protease cathepsin D was linked to disease complications affecting diabetes patients [77, 78]. NAGA is a lysosomal glycosidase responsible for the breakdown of glycolipids [79]. Lysosomes are organelles with a critical role in the disposal of waste products. They return small molecules such as amino acids, fatty acids, nucleotides and sugars back into circulation. As presented in Table 3, many abundance-changed proteins are glycosylated, among them extracellular matrix (ECM) proteins (brevican, tenascin, fibronectin, syndecan-4 and lumican), cell surface determinants (CD14 and CD320), and proteins secreted into body fluids (LRG1, AZGP1 and corticosteroid-binding globulin). These proteins have cell binding and adherence functions. CP is an acute phase protein in blood plasma and a ferroxidase involved in peroxidation (Fe2+ to Fe3+). A mutation of the gene results in aceruloplasminemia, the absence of functional CP, which is also relevant in diabetes [80].

Thirty proteins significantly changed in abundance (Table 3) were used for a clustering analysis with the Euclidean distance metric (Figure 2). A cluster enriched in T1D patients near the bottom of the heat map is dominated by the high-level HbA1c group and, to a lesser extent, the medium level HbA1c group. Random Forest classification was performed using expression profiles of the entire set of proteins and the set of 30 differentially abundant proteins. ROC values of 0.81 and 0.85, respectively, suggest that they classify the cohort into T1D patients and healthy controls. While no single protein had a ROC value greater than 0.75, biomarker sets consisting of 3 proteins had ROC values of 0.80 or greater. This value was 0.84 combining LRG1, CPQ and MYLK. A graphic depicting the relevance of proteins for the classification and ROC values is included in Supplementary Data, Dataset S2.

Figure 1

GO terms and KEGG pathway of proteins quantitatively (A) increased and (B) decreased in the T1D cohort using the ClueGO Plugin with statistically significant p-values < 0.05. The GO terms were grouped based on highest significance of GO terms. The GO terms are linked based on medium kappa score and the size of nodes denote the enrichment significance.

(Click on the image to enlarge.)

Figure 2

Hierarchical clustering of T1D and HS samples using the expression profiles of 30 differentially regulated proteins (see Table 2). The x-axis displays protein names while the y-axis displays clusters of the 210 subjects. The bar on the right denotes subjects in a two-color code (green for healthy subjects; red for T1D patients) and a three-color code based on HbA1c values (green for healthy subjects; red for medium levels and purple for high levels of HbA1c). Protein abundances (log values) are displayed in a red-to-blue color code (high to low abundance). The proteins COL12A1, NAGA, FGFR2, LRG1, A1BG and CD14 drive the separation of the two main clusters.

Using Cytoscape [55] proteins interacting with those that had a fold change of more than +/-1.2 were retrieved to generate a protein network in silico. This network consisted of 860 nodes with 940 edges (Supplementary Data, Dataset S2). Hub proteins in the network consist of FN1 (320 interactions), CDH1 (72), SPP1 (62), CD44 (58), CP (44), FGFR2 (28), ASAH1 (28), MUC1 (28), CTSD (27), CTSB (24), IGHM (23), CPE (20), SDC4 (19), CD14 (16), and AZGP1 (11). Many hub proteins, which play essential roles in cellular control, were differentially abundant comparing the T1D and HS groups, supporting the notion of a complex system that contributes to increased inflammation in T1D patients. CTSB and CTSD are cathepsins. ASAH1 is an acid ceramidase degrading sphingolipid ceramide into sphingosine and free fatty acid. These lysosomal enzymes degrade cellular waste products. A particularly interesting sub-network is shown in Figure 3. It highlights the interaction of LRG1, strongly increased in abundance in the T1D cohort, with FN1, a highly connected protein mediating cell migration and extracellular matrix rearrangements under inflammatory conditions. Previous studies in a mouse model have shown that LRG1 interacts with FN1. Our far western blot experiment showed that human LRG1 and FN1 also specifically interact (Figure 3, insert). Furthermore, FN1 interacts with SDC4, an ECM protein that was decreased in abundance, and with the enzymes CP and CTSD, both increased in abundance in the T1D dataset. CTSD and FN1 interact with insulin. Glycosylated FN1 was described as a potential biomarker for the prediction of gestational diabetes [81]. CTSD proteolytically cleaves insulin, insulin-like growth factor and several IGFBPs [82, 83]. Another peptidase displayed in the network (CPE) processes pre-insulin and other hormone precursors [84]. Considering the central place of FN1, an extensively glycosylated protein, in the network we hypothesize that changes in proteolytic and glycolytic pathways affect FN1, perhaps directly via modification in peptide or glycosylated side chains, and contribute to inflammation in T1D patients. Variation in glycosylation was previously shown to mediate changes in protein-protein interactions [85].

Figure 3

Section of protein-protein interaction network including the interactions of 29 proteins with differential abundance in the T1D and HS cohorts (LRG1, CD14, AZGP1, NAGA, CTSD, CTSB, CPE, and CP). The shade of red indicates the fold change in abundance (light red = > 1.2 and < 1.5; medium red = >1.5 and < 2.0; dark red = >2.0). Only LRG1 has fold change above 2.0 in the above network. A green denotes downregulated proteins with fold change > -1.0 and < -1.5 and include highly connected SDC4 and FN1. A yellow color denotes the absence of statistically significant changes of connected proteins in the network. A grey color denotes protein absence in our proteome datasets. The connection of CTSD and CPE to insulin (INS) pertains to INS protein processing by these peptidases. The separation of subcellular locations in the graphic illustrates the concept of lysosomes as subcellular biomarkers of altered metabolism in individuals with T1D. An insert (top-right) depicts the physical interaction of LRG1 and FN1. Albumin (HSA) served as a negative control of the FN1/LRG1 interaction. The extracellular matrix protein SDC4 did not show an interaction with LRG1 in far-Western blot experiments.

(Click on the image to enlarge.)

Role of Exosome Formation

Proteins ranked by Random Forest analyses as most important to discern T1D from HS proteomic datasets and KEGG/GO pathway enrichment analyses support the notion that there is a T1D-associated process of enhanced lysosomal protein excretion, potentially triggered by lysosomal membrane damage that, in turn, is related to AGE formation and perturbation of lipid metabolism in T1D patients. The evidence is emerging that lysosomes are not only a cellular waste disposal system but also a control center for metabolism in general [86]. Gangliosides are important glycosphingolipids with ceramide as the basic lipid building block. Published data suggest that ceramide induces the release of exosomes into multi-vesicular endosomes and their delivery to lysosomes [87]. Based on the hypothesis that lipid metabolic changes in T1D patients result in larger ceramide quantities that, in turn, result in more exosome-endosome trafficking, we assume that higher lysosomal protein contents in urine of T1D patients might be related to an increase in the release of urinary exosomes. We generated pooled urine samples from T1D and HS patients enriched for exosomes via filtration on 0.02 micron filters and ultracentrifugation at 100,000 x g for 15 h and analyzed the proteins in these concentrates by LC-MS/MS. We did not find evidence for either exosome enrichment or lysosomal protein enrichment within the exosome fraction for the T1D sample pool. Two to three percent of the peptide-spectral matches pertained to lysosomal proteins suggesting that the latter are largely secreted as soluble proteins into urine for both the T1D and the HS cohort.

Concluding Remarks

We have shown in two separate proteomic studies, with overlapping cohorts comprised of 81 and 215 T1D patients and healthy siblings, that many proteins increased in abundance in the urine of T1D versus healthy subjects are associated with catabolic functions in lysosomes. Lysosomes are a cellular compartment responsible for the degradation of cellular waste including glycoproteins, glycolipids, and proteoglycans [31, 32]. Aberrantly glycated proteins and lipids and products of their catabolism, called AGEs, are one factor implicated in the etiology of clinical complications of T1D such as nephropathy, retinopathy and atherosclerosis. AGEs are formed due to chronically elevated sugar levels in diabetes patients. They activate the catabolism in lysosomes and disrupt the autophagy-lysosome pathway in renal TECs via analyses of human tissues from individuals diagnosed with DN and in renal TEC lines [33]. A consequence of this pathway's perturbation is lysosomal membrane permeabilization, a process resulting in cathepsin protease release and cell death [33]. The observation of lysosomal enzymuria is consistent with an early phase of this pathology as renal TECs interface with the urinary ducts that allow the physiological resorption and excretion of various molecules from the kidneys into the urinary tract and vice versa. A diagnostic biomarker for advanced DN, albumin, is measured in urine when the reabsorption capacity of renal TECs reaches its limit. We are intrigued by a potential subclinical renal pathology of diabetes that can be measured long before albuminuria begins: lysosomal enzymuria. Indeed, the loss of lysosomal enzymes in urine was suggested as an early stage biomarker of DN [88] [34]. Elevated plasma levels of lysosomal enzymes have also been described as biomarkers of diabetic microangiopathy correlating with glycemic control [89]. Together with our data, these findings support the concept that membrane damage in lysosomes of renal TECs occurs in T1D patients with frequently elevated glucose levels at a younger age than both clinical and histological data currently suggest. A simple diagnostic test based on the quantitative measurement of lysosomal enzyme contents in the urine of individuals diagnosed with T1D could be useful.

We have no evidence that the observed gut microbiome changes, affecting a relatively small subset of gut microbial phylogenetic groups according to 16S rRNA profiles in the comparison of ~ 100 patients and their healthy siblings, are physiologically important and linked to imperfect glycemic control or related to subclinical inflammation in young T1D patients. The changes in lysosomal function and structural integrity in T1D patients may be metabolic memory effects of frequent hyperglycemia. Our data do not support the notion that metabolic memory effects in T1D patients modulate the gut microbiome structure. Further research including longitudinally surveyed patient cohorts and the examination of metabolic capabilities of Streptococcus salivarius are needed to assess the role of gut microbial community members with potential benefits to the health of T1D patients.

Acknowledgements

This work was supported by National Institute of Diabetes and Digestive and Kidney grant 1DP3DK094343-01. We thank the Ruggles Family Foundation for a contribution to fund the mass spectrometer used for the proteomics experiments. We thank Ms. Kenya Platero for technical assistance in the laboratory. The funding bodies did not have a role in study design, data collection and analysis, the decision to publish, and preparation of the manuscript.

22.
Stewart CJ, Nelson A, Campbell MD, Walker M, Stevenson EJ, Shaw JA. et al. Gut microbiota of Type 1 diabetes patients with good glycaemic control and high physical fitness is similar to people without diabetes: an observational study. Diabetic medicine: a journal of the British Diabetic Association. 2016;34:127-134