Additional file 1.

Figure S1. Distribution of unclassified genera of 22 habitats. Sequences that could not be classified at RDP confidence threshold 0.5 were assigned
to unclassified genera. Unclassified reads account for relatively small proportion
of the total reads in the majority of the samples. Figure S2. Phylum profiling of 22 human habitats. The average relative abundance of phyla in each habitat was measured by the fraction
of total 16S rRNA gene sequences. Each color represents a phylum. (A) Firmicutes, Actinobacteria, and Proteobacteria are the major phyla identified in
human body. (B) Phyla accounting for <0.5% of the total phyla are shown. Preterm baby stool in this
dataset does not contain low abundance phyla with the 0.5% standard, thus there are
no data plotted. The total fractions of the phyla <0.5% in this figure are listed
on top of the plot. Figure S3. Accumulation curves at the genus level. The only difference between Figure S3A and Figure 1 is that all the samples were rarified to 1,000 reads in Figure S3A. The accumulation
curves exhibit similar patterns in both figures. Figure S3B shows stool richness at
different sequencing depths. Sixty stool samples with >9,000 reads were rarified to
1,000, 3,000, 6,000, and 9,000 reads. Both deep sequencing and a large number of subjects
are required to detect all the possible taxa. Figure S4. The association of sequencing depth and sample frequency. The x-axis shows the rank abundance of each genus and the y-axis shows the number
of subjects who share the genus. Sixty stool samples with >9,000 reads were rarified
to 1,000, 3,000, 6,000, and 9,000 reads. The points showing the abundance of each
genus at different depths are linked by line segments. With increased sequencing depth,
the number of subjects who share the same genus, including the minor genera, is increased.
Figure S5. The relative abundances of taxa in each habitat and dispersal among subjects. Dispersal of a given genus is indicated by sample prevalence of that genus on x-axis.
The average relative abundance (m ± se) of each genus is indicated on y-axis. The
most abundant genera in general have the highest prevalence. However low abundance
genera can also be ubiquitous and high abundance genera can be distributed in a limited
number of subjects. Also see Figure 4. Figure S6. The correlation of abundant taxa between their dominant habitats and less dominant
sites. The abundances of Bacteroides from stool, Streptococcus from throat, and Lactobacillus from posterior fornix were compared with abundances in the rest of the habitats. The
taxon abundance lacks correlation between major habitats (oral, skin, vaginal, stool),
but it shows moderate correlation within the oral and vaginal sites. Figure S7. Dynamics of Lactobacillus in vaginal habitats between visits. The relative abundances of Lactobacillus undergo great changes between two visits. Vaginosis related genera (Gardnerella, Prevotella, Atopobium) are over-represented after Lactobacillus loses its dominance. These subjects were asymptomatic and met the criteria of HMP
study. Figure S8. Dynamics of Moraxella in anterior nares between visits. The relative abundance of Moraxella (colored orange) varies from 0% to 70% between visits. Figure S9. The relative abundances of Bacteroidetes and Firmicutes in HMP stool samples and twin study stool samples. The relative abundances of two major phyla Bacteroidetes and Firmicutes are plotted. Bacteroidetes in HMP stool samples are significantly higher than those in obese and lean group of
the twin studies (P <0.001). Figure S10. Cluster analysis of the HMP dataset with data from other studies. (A) Clustering analysis of HMP stool and twin study stool samples. Hierarchical clustering
was performed using Bray-Curtis dissimilarity and complete linkage. Red labels represent
the HMP samples and blue labels represent twin study samples. (B) Clustering analysis of HMP saliva and Chinese saliva samples. Red: HMP sample; green:
healthy controls of Chinese saliva samples; blue: Chinese saliva samples from subjects
with dental caries. The majority of the samples are clustered by project rather than
health status.