Abstract

The human endogenous intestinal microflora is an essential "organ" in providing nourishment, regulating epithelial development, and instructing innate immunity; yet, surprisingly, basic features remain poorly described. We examined 13,355 prokaryotic ribosomal RNA gene sequences from multiple colonic mucosal sites and feces of healthy subjects to improve our understanding of gut microbial diversity. A majority of the bacterial sequences corresponded to uncultivated species and novel microorganisms. We discovered significant intersubject variability and differences between stool and mucosa community composition. Characterization of this immensely diverse ecosystem is the first step in elucidating its role in health and disease.

Number of sequences per phylotype for each sample. The y axis is a neighbor-joining phylogenetic tree containing one representative of each of the 395 phylotypes from this study; each row is a different phylotype. The phyla (Bacteroidetes, non-Alphaproteobacteria, unclassified near Cyanobacteria, Actinobacteria, Firmicutes, Fusobacteria, and Alphaproteinobacteria, ordered top to bottom) are color coded as in and fig. S1. Each column is labeled by subject (A, B, C) and anatomical site. For each phylotype, the clone abundance is indicated by a grayscale value.

Collector's curves of observed and estimated phylotype richness of pooled mucosal samples per subject. Each curve reflects the series of observed or estimated richness values obtained as clones are added to the data set in an arbitrary order. The curves rise less steeply as an increasing proportion of phylotypes have been encountered, but novel phylotypes continue to be identified to the end of sampling. The relatively constant estimates of the number of unobserved phylotypes in each subject as observed richness increases (the gap between observed and estimated richness) indicate that estimated richness is likely to increase further with additional sampling. The Chao1 estimator and the abundance-based coverage estimator (ACE) are similar, but the ACE is less volatile because it uses more information from the abundance distribution of observed phylotypes. Individual-based rare-faction curves are depicted in figs. S4 to S6.

DPCoA for (A) colonic mucosa (solid lines) and stool (dashed lines), (C) colonic mucosal sites alone, and (D) mucosal sites excluding Bacteroidetes phylotypes. Phylotypes are represented as open circles, colored according to phylum as in . Phylotype points are positioned in multidimensional space according to the square root of the distances between them. Ellipses indicate the distribution of phylotypes per sample site, except in (A), where all mucosal sites are represented by one ellipse. Percentages shown along the axes represent the proportion of total Rao dissimilarity captured by that axis. (A) is the best possible two-dimensional representation of the Rao dissimilarities between all samples (). (B) is an enlarged view of (A), depicting the centroids of each site-specific ellipse. Subject ellipse distributions remain distinct after stool phylotypes (C) and Bacteroidetes phylotypes (D) are excluded from the analysis.