Abstract

The composition of the human microbiota is recognized as an important factor in human health and disease. Many of our cohabitating microbes belong to phylum-level divisions for which there are no cultivated representatives and are only represented by small subunit rRNA sequences. For one such taxon (SR1), which includes bacteria with elevated abundance in periodontitis, we provide a single-cell genome sequence from a healthy oral sample. SR1 bacteria use a unique genetic code. In-frame TGA (opal) codons are found in most genes (85%), often at loci normally encoding conserved glycine residues. UGA appears not to function as a stop codon and is in equilibrium with the canonical GGN glycine codons, displaying strain-specific variation across the human population. SR1 encodes a divergent tRNAGlyUCA with an opal-decoding anticodon. SR1 glycyl-tRNA synthetase acylates tRNAGlyUCA with glycine in vitro with similar activity compared with normal tRNAGlyUCC. Coexpression of SR1 glycyl-tRNA synthetase and tRNAGlyUCA in Escherichia coli yields significant β-galactosidase activity in vivo from a lacZ gene containing an in-frame TGA codon. Comparative genomic analysis with Human Microbiome Project data revealed that the human body harbors a striking diversity of SR1 bacteria. This is a surprising finding because SR1 is most closely related to bacteria that live in anoxic and thermal environments. Some of these bacteria share common genetic and metabolic features with SR1, including UGA to glycine reassignment and an archaeal-type ribulose-1,5-bisphosphate carboxylase (RubisCO) involved in AMP recycling. UGA codon reassignment renders SR1 genes untranslatable by other bacteria, which impacts horizontal gene transfer within the human microbiota.

Candidate phylum SR1 includes cosmopolitan bacteria that are found in marine and terrestrial high-temperature environments, fresh-water lakes, and subsurface aquifers (1). There are no cultivated representatives of SR1. Environmental sequencing of small subunit (SSU) rRNA first identified these bacteria in contaminated aquifers (2). These bacteria are usually found in sulfur-rich and oxygen-limited environments, suggesting a potential microaerophilic, sulfur-based metabolism (3). SR1 bacteria also associate with animals and exist in termite and mammalian digestive tracts (1, 4, 5), as well as in the human oral cavity (6). SR1 is in low abundance in healthy oral microbiota (∼0.1% on average), but their abundance increases several-fold in patients with H2S-related malodor (7) and in periodontal disease (8). Uncultivated microbial taxa, especially those in low abundance in the environment, are refractory to typical genomic and microbiological techniques. Single-cell genomics and genomic reconstruction using metagenomic data significantly advanced understanding of uncultivated microbes (9). To gain insight into the biology of human-associated SR1 bacteria, we used single-cell genomic sequencing of an SR1 cell isolated from a human oral sample. Human Microbiome Project (HMP) data expanded genomic coverage, enabled evolutionary analyses of SR1 bacteria across the human population, and revealed evidence of a unique genetic code.

At the time of its elucidation, the genetic code was thought to be universal and invariant (10). However, we now know of natural code expansion with selenocysteine and pyrrolysine (11). One organism, Acetohalobium arabaticum, dynamically varies the number of amino acids it encodes between 20 and 21, depending on its carbon source (12). The first variant genetic code, observed in human mitochondria (13), showed that the code is able to change over evolutionary time. There are now 18 nonstandard codes in various organisms or organelles, and half of these decode UGA as tryptophan (Trp). We provide genomic and biochemical evidence showing that SR1 and related bacteria require reassignment of UGA to glycine (Gly) to produce an active proteome.

Results

Human-Associated SR1 Bacteria Are Diverse and Specific to Body Niches.

To determine the level of phylogenetic diversity of SR1 bacteria associated with the healthy human body, we used the SSU rRNA gene pyrosequencing data (V3–5 region) from the HMP consortium (14). Of ∼26 million sequences, 1,657 were assigned to the candidate phylum SR1. SR1 bacteria were identified in oral cavity samples and on the skin (15). Six major operational taxonomic units (OTUs) were identified based on sequences present in more than two copies, accounting for 99% of the data, with 17 additional OTUs being represented by one or two sequences. The SR1-OR1 genome data are from a single cell belonging to OTU1 (Fig. 1).

Hierarchical clustering of bacterial SR1 OTU abundance (Bray-Curtis similarity matrices) by body sites and corresponding frequency distribution of the six major SR1 OTUs at those sites. The skin SR1 sequences (indicated by “s”) were combined for OTU abundance calculation due to their low numbers. SR1-OR1 belongs to OTU#1.

The distributions of SR1 OTUs in saliva and supra- and subgingival plaques are similar, but partitioned differently from those in the rest of the mouth (Fig. 1). SR1 from the throat, tonsils, tongue, and hard palate were distinct from those in saliva and other oral sites. Because saliva is present in all areas of the oral cavity, this heterogeneity suggests a degree of niche specialization for the different SR1 phylotypes. The skin harbors an approximately equal distribution of the major OTUs found in the mouth. Phylogenetic analysis of host-associated and free-living SR1 bacteria indicates that all human phylotypes and other animal-associated taxa classify within the subgroup III (1) (Fig. S1). This phylogeny separates the free environmental members of the SR1 phylum from the host-associated ones and partitions the human lineages in two distinct groups, each including oral and skin types.

Oral SR1 Evolved from a Deep-Branching Bacterial Phylum with a Unique Genetic Code.

Based on the single-cell genomic sequence, the draft SR1-OR1 genome was assembled from 56 contigs totaling 0.46 Mbp. Initial gene prediction revealed abundant, short, contiguous ORFs with identical annotations but separated by canonical terminator TGA codons (Fig. S2). Sequence alignments of conserved proteins revealed a high frequency of interruptions of SR1-OR1 genes by TGA codons at highly conserved Gly positions across Bacteria. For example, the translation initiation factor IF2 is interrupted at three locations that are 95–100% Gly in over 600 homologs from all bacterial phyla. Both RNA polymerase β and β′ have TGA at invariant Gly positions (motifs GDK and GRFR). Most of those interruptions were also found in over 70 highly similar scaffolds identified in 13 HMP oral metagenomes (16). These data expanded the SR1-OR1 assembly to over 1.1 Mbp in 49 contigs (SI Results and Table S1).

In addition to a canonical tRNAGlyUCC, the SR1 genome encodes an unusual tRNAGly-like sequence with an opal decoding anticodon (5′-UCA-3′). Although only half of the molecule (T-arm, acceptor stem) is similar to normal tRNAGly sequences, tRNAGlyUCA includes most of the glycyl-tRNA synthetase (GlyRS) identity elements (17): that is, particular nucleotides (U73, G1:C72, C2:G71, G3:C70, C35, C36) required for GlyRS recognition and glycylation (Fig. S3). These data indicate that oral SR1 bacteria use the opal stop codon (UGA) for Gly. Identical tRNA genes were found in two SR1 oral HMP sequences, and similar tRNAGlyUCA sequences are evident in the genomes of related bacteria ACD78 and ACD80, which were reported to encode Trp with UGA (18). All three tRNAGlyUCA species lack the identity elements (G73, A1:U72, G2:C71, G3:C70, C34, C35, A36) required for tryptophanyl-tRNA synthetase (TrpRS) activity (Fig. S3). U73 and G1:C72 in tRNAGlyUCA are incompatible with TrpRS function (19).

Gene mapping based on reassignment of TGA to Gly resulted in the prediction of 994 protein genes, with an ORF size distribution typical of bacteria (Figs. S2 and S4). Based on Bacteria-specific conserved single-copy genes (Table S2), we estimated the SR1-OR1 genome is ∼56% complete. As is common with uncultivated phyla, ∼35% of SR1’s predicted proteins have no homologs. A significant fraction of the proteins (40%) are most similar to their counterparts in ACD80 (18) (Fig. S4). RNA polymerase phylogeny confirmed that SR1 is closely related to ACD80, and both appear as deep branches in the bacterial domain (Fig. 2 and Fig. S5). Fellow uncultivated phyla BD1-5 (including ACD78), PER, TM7, OD1, and OP11 are the next most closely related taxa, and Chloroflexi are the nearest cultured relatives.

Maximum-likelihood phylogenetic tree of SR1 and related bacteria based on RNA polymerase protein sequences (β-β′ subunits). A phylogeny representing all Bacteria is shown in Fig. S5. Node labels denote branch support. The cluster of candidate phyla including SR1 is highlighted.

SR1 Is a Fermentative Anaerobe with an Archaeal Metabolic Trait.

The incomplete genomic assembly combined with the absence of human-associated or environmental isolates limits our ability to experimentally characterize SR1 bacteria, yet genomic reconstruction provides initial inferences of the metabolism and lifestyle of these organisms. Genes encoding enzymes for several glycolysis steps were identified (phosphoglycerate mutase, phosphopyruvate hydratase, and pyruvate kinase) as well as for pyruvate formate lyase, which converts pyruvate to acetyl-CoA. Acetate kinase, potentially involved in substrate-level phosphorylation is encoded in addition to subunits of an F1F0-type ATP synthase. As in its free-living relatives, SR1 show no evidence of a tricarboxylic acid cycle or electron transport chain components, suggesting SR1 are nonrespiring. The genome encodes over a dozen distinct peptidase families, a pectinase, and a glycosyl hydrolase, which may produce fermentable substrates.

SR1 bacteria encode a ribulose-1,5-bisphosphate carboxylase (RubisCO) gene classified as a distinct subfamily containing homologs from several methanogenic archaea (20) (Fig. S6 and Dataset S1). Even though this type of RubisCO was shown to fix CO2 in an in vivo complementation assay (21), its physiological role is in the AMP-recycling pathway involving AMP phosphorylase (DeoA) and ribose 1,5-bisphosphate isomerase (E2b2). The resulting 3-phosphoglycerate is available to glycolysis or gluconeogenesis (22).

This AMP-recycling pathway was previously unknown in human-associated bacteria. SR1-related ACD80 and PER subsurface bacteria also encode the archaeal RubisCO (18). An inserted loop is found in the SR1 and ACD80 genes (Fig. S7). The SR1-like RubisCO sequences are highly similar to the archaeal homologs (73–76% identity). The bacterial DeoA and E2b2 sequences are also highly similar to their archaeal homologs, indicating horizontal gene transfer (HGT) of the entire pathway. Although it is not possible to define the trajectory of HGT with certainty, the bacterial sequences are apparently derived from archaeal ancestors as they branch within a larger group of exclusively archaeal relatives (Figs. S6 and S8). Archaeal RubisCO is oxygen-sensitive and is found in strict anaerobes, suggesting that in order for the RubisCO pathway to function, oral SR1 may require anaerobic or microaerophilic conditions. SR1-OR1 encodes an alkyl hydroperoxide reductase and a superoxide dismutase that may protect the cell from oxidative damage. Genes for general stress response proteins, protein processing (DnaJ, GrpE, Clp protease), and assembly of iron-sulfur (Fe-S) clusters were also identified.

The SR1-OR1 genome includes genes for murein biosynthesis and a tripeptide synthase that was suggested to confer Gram-positive characteristics to candidate phylum TM7 (9). Genes that provide antibiotic resistance (hemolysin, capsule production protein) and possibly confer natural DNA competence (ComEC) are evident. There are no flagellum genes, but several genes for type II and type IV secretion systems were identified. Detection of pilus (PilB,C, T, and TraX) assembly genes suggests that limited mobility (twitching) and cell-to-cell interactions are possible. An FtsZ homolog indicates that SR1 cells likely divide using a “z-ring” mechanism.

TGA Exchanges with Canonical Gly Codons Across the Human Population.

Inspection of the SR1-OR1 and SR1-type metagenomic scaffolds (i.e., SR1 “pangenome”) revealed that most TGAs are conserved across the HMP metagenome. In SR1-OR1, 85% of the predicted ORFs have at least one in-frame TGA and 24 genes (2.4%) encode over 20 TGAs (Fig. 3A). In some alleles, homologous sites contain canonical Gly codons (Table S3), and synonymous codon substitution of TGA for GGN is evident in comparing SR1 isolated from different individuals (Fig. 4). Analysis of ORFs with internal TGAs assigned to Gly did not reveal predicted peptides from two separate genes that were incorrectly joined and there was no unusual gene overlap. None of the ORFs appear to require UGA to stop translation, so it is not surprising that release factor 2, which directs termination at UGA, is absent.

Codon use in SR1. (A) Frequency of internal TGAs in protein coding genes from SR1-OR1 and SR1 metagenomic scaffolds. (B) Box-and-whisker plot comparison of Gly codon use in the SR1-OR1 (♦) with that of predicted genes in all SR1 HMP metagenomic scaffolds (SR1 “pangenome”). Outliers are indicated with an asterisk.

Distribution of reprogrammed TGA codons in SR1 RubisCO genes. The type of Gly codon, TGA (black) or canonical GGA (white), at the four reprogrammed positions in the RubisCO SR1 pangenome is overlaid on the gene phylogeny, based on full-length sequences from HMP metagenomes and SR1-OR1. Each HMP number defines a different human donor. For four human subjects, two SR1 phylotypes were identified or genes were present in samples collected at different times (-2). Multiple sequences from donors 159814214 (*), 809635352 (grey rectangle), 370425937 (#), and 764447348 (oval) are indicated by superscript symbols. The average dN/dS values for the two main clades are indicated.

The frequency of TGA codons is ∼24%. Canonical Gly codon use correlates with the low G+C content of the genome (GGA 42%, GGT 16%, GGG 13%, and GGC 3%). The same distribution was observed across the pangenome. By comparison, reprogramming in Mycoplasma capricolum is much more extensive, with 65% of the proteins containing exclusively UGA-encoded Trp (Fig. S2C). We observed evolution of TGA use across the SR1 pangenome. Except for elevated GGC use in SR1-OR1, Gly codon use for 2,083 alleles across 13 human donors is similar to that in the single cell (Fig. 3B and Table S4). At individual sites, TGA alternates primarily with GGA, but replacements with all GGNs occur. Among 71 oral SR1 RubisCO alleles, TGAs were found at four Gly loci, with a distribution that correlates with the phylogenetic grouping of oral SR1 OTUs (Fig. 4 and Dataset S1).

The nonsynonymous to synonymous substitution rates ratio (dN/dS) across the SR1 pangenome is low (0.023 for RubisCO), indicating strong natural selection and suggesting that low G+C content and possibly TGA reassignment do not result from neutral mutational drift (SI Results), as in clonal bacterial pathogens (23). The pangenome is dominated by a fivefold overabundance of synonymous transitions (A-G and C-T). The transversion that most likely drives the Gly reprogramming (GGA > TGA) is less frequent and competes with purine excess at the first codon position (62%) compared with 50% at second and third positions. In Mycoplasma, a drastic loss of G+C content at the third codon position (9% vs. ∼30% at first and second position) contributed to near complete replacement of the TGG Trp codon with TGA.

SR1 encodes a canonical glycyl-tRNA synthetase (α-dimeric type), which is similar to the well-characterized Thermus thermophilus enzyme (24). Because SR1 bacteria are uncultivated and no genetic system exists, we tested the in vivo activity of SR1 GlyRS and tRNAGlyUCA variants in E. coli. A lacZ gene with a TGA codon at position 3 serves to report the level of translational read-through of UGA by β-galactosidase activity. Compared with the wild type β-galactosidase (Met3), endogenous Trp-tRNATrp significantly suppresses the stop codon function of UGA, leading to 15 ± 2% translational read-although of UGA (Fig. 5A) (25). Expression of SR1 tRNAGlyUCA leads to enhanced read-through of UGA (22 ± 2%), and expression of the SR1 GlyRS yields additional UGA translation (25 ± 3%). SR1 GlyRS and tRNAGlyUCA are active molecules when expressed in E. coli. When only tRNAGlyUCA is present, increased UGA translation is likely because of aminoacylation of the tRNA by E. coli’s native GlyRS. ACD78 (23 ± 1%) and ACD80 tRNAGlyUCA (52 ± 4%) both supported enhanced read-through of UGA compared with background (Fig. 5A). Although coexpression of SR1 GlyRS did not stimulate UGA suppression further for the ACD78 tRNA, glycylation of ACD80 tRNAGlyUCA by SR1 GlyRS leads to a high level (68 ± 2%) of UGA translation as Gly.

Biochemical characterization of SR1 GlyRS and tRNAGlyUCA. (A) To assay UGA translation in vivo in E. coli, β-galactosidase was expressed from a lacZ reporter gene with either Methionine 3 (wild-type) or M3 mutated to TGA. β-Galactosidase activities, shown relative to the wild-type activity level, were measured in the presence or absence of tRNAGlyUCA variants (SR1, ACD78, and ACD80) and SR1 GlyRS. (B) In vitro glycylation activity of SR1 GlyRS with tRNAGlyUCA (SR1: □; ACD78: +; ACD80: ♢) and canonical tRNAGly (SR1 ○, E. coli △) variants. The data are based on triplicate experiments with 10 μM GlyRS and 0.2 μM tRNA.

SR1 GlyRS Glycylates UGA-Decoding and Canonical tRNAGly Species.

To verify glycine-accepting activity of the atypical tRNAGlyUCA variants (Fig. S3), we investigated aminoacylation activity of recombinant SR1 GlyRS with several tRNAGly substrates in vitro (Fig. 5B). The enzyme reaches a plateau for glycylation (17.5 ± 0.2% of total tRNA) with its canonical tRNAGly that is similar to the amount of Gly-tRNAGlyUCA formed (15.1 ± 1.2%). SR1 GlyRS displays reduced but significant activity with E. coli tRNAGly (11.4 ± 0.4%). ACD78 (24.0 ± 2.6%) and ACD80 (22.1 ± 0.2%) are the most active substrates, but only the ACD80 variant promotes efficient UGA read-through in vivo (Fig. 5A). The ACD78 tRNA may be less compatible with EF-Tu or translocation on the E. coli ribosome. The ACD78 tRNA sequence is distinct from the SR1 and ACD80 tRNAs, yet only mutations G15A and U47C separate the SR1 and ACD80 variants (Fig. S3).

Discussion

Single-cell genomics and assemblies of metagenomes provide new opportunities to characterize uncultured constituents of complex microbial communities. Even though less diverse than many free-living communities, human and animal-associated microbiota harbor uncultured bacteria at all taxonomic levels (5, 14). We found that the human body harbors a surprising diversity of SR1 bacteria. ACD80 (18) are the closest relatives of SR1 (Fig. 2 and Figs. S4–S8) and part of a cluster of phyla without cultured representatives that are relatively deep branching in the bacterial domain (Fig. S5). The existence of human-associated species in phyla dominated by anaerobic thermophiles is intriguing. Conceivably, as with other bacterial phyla that predominantly consist of free-living species (e.g., candidate divisions TM7 and OP11, phyla Chloroflexi, Nitrospira) (6, 8, 26), SR1 may have come in contact with the human host, survived, and evolved as part of the resident microbiota.

SR1 are less phylogenetically diverse and abundant compared with more successful colonizers of the human microbiome (e.g., Firmicutes and Bacteroidetes). SR1 either colonized human and animal hosts relatively recently or they are less adaptable or competitive than other taxa. Nevertheless, human oral and skin-associated SR1 present signs of diversification and potential niche specialization, with strains or species preferring microenvironments that offer different levels of oxygen, nutrient, or biofilm opportunities. Periodontal disease may be linked to increased SR1 abundance (8) but their role, if any, in the etiology of disease is unknown. Cultivation and complete sequencing of multiple lineages from this group, including free-living representatives, will broaden our view of the biology, evolution, and role of SR1 bacteria in human health.

Molecular Mechanism of UGA Reassignment.

The most unexpected finding was that human oral SR1 bacteria use a novel variation of the genetic code, in which the UGA terminator has been reprogrammed to a Gly codon. SR1, ACD78, and ACD80 tRNAGlyUCA variants contain most of the identity elements required for GlyRS recognition (Fig. S3), but the opal decoding anticodon should inhibit GlyRS activity. T. thermophilus GlyRS is 5,000-fold less catalytically efficient with a tRNAGly mutant (C36A) with the UCA anticodon (27), yet SR1 GlyRS aminoacylates tRNAGlyUCC and opal decoding tRNAGly variants with similar activity. Some nucleotides in the divergent D-arm and acceptor stem of tRNAGlyUCA may compensate for the C36A mutation to enhance glycylation and optimize reassignment of UGA to Gly. ACD80 tRNAGlyUCA differs only at positions 15 and 47 from the SR1 tRNA, but shows increased Gly-tRNAGlyUCA production and threefold enhanced ribosomal decoding of UGA, yielding 70% translational read-through of UGA in vivo (Fig. 5).

Gly codon reassignment is rare and has only been detected in Pyura stolonifera mitochondria (28) and related genomes where Gly incorporation is directed by two arginine codons. Although SR1, ACD78, and ACD80 are the only organisms with UGA assigned to Gly, early work in E. coli produced tRNAGly variants that could translate UGA, other stop codons, and even sense codons (29). Replacing a conserved Gly in an essential gene with either a non-Gly sense codon or a stop codon provided a conditional lethal mutant. Selection experiments lead to the isolation of tRNA variants that could suppress the lethal mutation. The cells were rescued by tRNAs that mistranslated sense codons as Gly (missense suppressor) or read-through stop codons with Gly (nonsense suppressor). The tRNAGly UGA suppressors isolated in these experiments differ only by one or two mutations from canonical tRNAGly and are less efficient in translating UGA [2–45% (30)] compared with the ACD80 tRNA. In their native cellular context, nucleotide modifications (SI Discussion) may improve UGA decoding for the SR1 and ACD78 tRNAs.

In the absence of an opal-decoding tRNA, background translation of UGA in E. coli and possibly in SR1 results from near-cognate translation of Trp-tRNATrp. This result leads to insertion of Trp in response to UGA in reporter proteins (Fig. 5) (25), and likely to varying levels throughout the proteome. When SR1 GlyRS and tRNAGlyUCA are coexpressed, the proteome may contain some level of Trp and Gly at UGA-encoded loci. As in the case of the ACD80 tRNA, however, increasing the cellular concentration of Gly-tRNAGlyUCA leads to enhanced UGA read-through that may reach a sufficient level to completely outcompete Trp. Given these data, it is fascinating that a recent report identified peptides containing Trp in response to UGA in ACD80 and ACD78 samples (18). Although some level of Trp incorporation at UGA might be tolerated by the cell, many of the UGAs in SR1 and related bacteria encode essential or invariant Gly residues. Global replacement of these residues with Trp would likely lead to an inactive proteome. The GlyRS/tRNAGlyUCA pairs found in these organisms should produce sufficient Gly-tRNA to dominate UGA translation with Gly. Codon reassignment is a shared character of SR1, ACD78, and ACD80 that likely evolved before these deeply branching lineages diverged.

Evolutionary Consequences of Codon Reassignment.

Although a specialized tRNA is required for codon reassignment, the forces that induce genetic code evolution and selectively maintain UGA as a sense codon are less obvious. Reduction in genome size (31) and genomic AT or GC bias (32, 33) are two known evolutionary phenomena linked to codon reassignment. Based on our estimated coverage, the SR1 genome is likely not larger than 2 Mb, and SR1 and related bacteria that reassign UGA (18) all have small AT-rich genomes. Theories of genetic code evolution differ mainly on whether a codon reassignment event will lead to ambiguous codon reading and mistranslation. Ambiguous codons are tolerated in Candida (34), but other organisms, such as M. capricolum (35), display more complete codon reassignment.

Although we found no evidence in homologous ORFs of UGA-intended stop codons in SR1, it is not known if in vivo some of the UGAs in SR1 are read as stop or if they lead to different levels of read-through with Gly. In the human oral SR1 pangenome, we observed codon use variation across different strains and human hosts, showing how the code evolves in real time. The exchange of canonical Gly codons with TGA demonstrates that different SR1 strains interchangeably use these codons for Gly. In RubisCO and other genes, TGA for GGN exchange tracks with the phylogenetic separation of SR1 strains in distinct oral habitats (Figs. 1 and 4), resembling ecological differentiation in free-living bacteria (36).

Why do variant genetic codes evolve and why do lineages maintain code variations over evolutionary time? Differential fidelity or efficiency of Gly incorporation at UGA compared with canonical Gly codons could provide a selective “handle” for maintaining a variant code. Because it is unknown if any intrinsic phenotypic cost or benefit of this type exists, perhaps the advantage that SR1 derives from its variant code can only be understood in terms of its ecological context. This genetic code variation will bias the susceptibility of oral SR1 to phage predation and HGT. A highly diverse repertoire of phages is present in saliva, and by mediating HGT, phages significantly impact evolution of the microbial community and the distribution of pathogenicity islands (37). HGT is rampant within the human microbiota, outpacing that in free-living communities and enabling efficient adaptation to host niches, antibiotic resistance, and the emergence of pathogenicity (38). We hypothesize that SR1’s altered genetic code allows the organism to successfully acquire genes from its environment, but severely limits the ability of foreign hosts to translate genetic material from SR1. In SR1, any transcribed foreign genetic material lacking UGA codons would be translated normally, but expression of some genes could result in read-through of UGA stop codons, potentially impacting protein folding and enzymatic activity. Translation of most SR1 genes, on the other hand, will lead to prematurely terminated peptides and inactive enzymes in essentially any other member of the oral microbiota. This genetic incompatibility partially isolates SR1 from the human microbiome gene pool, which may limit its adaptability, but also prevents SR1’s competitors from sharing its genomic innovations. SR1 bacteria, therefore, evolve as a quasi-independent taxonomic island within the complex community of microbes that colonize the human body.

Experimental Procedures

SR1 Single-Cell Genomics.

A bacterium representing the candidate phylum SR1 was identified among cells randomly isolated by flow cytometry sorting from a healthy oral-subgingival sample. The genomes of individual bacterial cells were amplified using multiple-displacement amplification (39) followed by taxonomic characterization using rRNA gene amplification and sequencing. The amplified SR1 genomic DNA was sequenced using 454 Titanium and Illumina High Seq platforms. Following quality control and abundance normalization, sequences were assembled into a draft SR1-OR1 genome. See SI Experimental Procedures for experimental and computational details.

To identify scaffolds likely representing SR1 bacteria in HMP metagenomic data, sequence similarity searches were performed at protein and DNA levels in IMG_HMP (16). Scaffolds that had syntenic gene content with the SR1-OR1 contigs and that displayed high-sequence similarity (generally >95% at DNA level in coding regions) were used to expand the genomic coverage and to enable pangenomic analyses of SR1 genomic variability across the human microbiome. For taxonomic diversity analysis, the HMP SR1 SSU rRNA pyrosequence data were analyzed with respect to distributions at defined human body sites and abundance of individual species-level taxonomic units (SI Experimental Procedures).

Biochemical Assays.

In vitro aminoacylation and in vivo UGA translation assays were performed as previously described (25). Experimental details are in SI Experimental Procedures.

Acknowledgments

We thank S. Allman and Z. Yang (Oak Ridge National Laboratory) for technical assistance; T. Vishnivetskaya for providing Human Microbiome Project SR1 pyrosequences; Ilka Heinemann, Jiqiang Ling, and Laure Prat for inspired discussions; the Human Microbiome Project research community for providing sequence data; and the developers of Integrated Microbial Genomes for analysis platforms. This work was supported by National Institutes of Health Grants R01 HG004857 (to M.P.) and GM22854 (to D.S.); Defense Advanced Research Projects Agency Contract N660-12-C-4020 (to D.S.); the Oak Ridge National Laboratory (managed by the University of Tennessee-Battelle) via the US Department of Energy Contract DE-AC05-00OR22725; and US Department of Energy Joint Genome Institute and Department of Energy Contract DE-AC02-05CH11231 (to P.S., T.W., and A.S.).

Data deposition: The SR1-OR1 final assembly sequence data have been submitted to GenBank under BioProject (accession no PRJNA189303). The sequence data are readily available with full annotations under the Integrated Microbial Genomes portal at http://img.jgi.doe.gov/cgi-bin/w/main.cgi (taxon ID: 2517572135).

References

(2009) Assessment of the diversity, abundance, and ecological distribution of members of candidate division SR1 reveals a high level of phylogenetic diversity but limited morphotypic diversity. Appl Environ Microbiol75(12):4139–4148.

A study examines trends in global fishing fleets and finds that by 2015, 68% of the global fishing fleet became motorized, and that the overall number of fleet vessels increased to 3.7 million, despite a consistent decrease in the catch per unit of effort.

A method to determine gender from fingerprints suggests pottery making was not a primarily female activity in ancient Puebloan society, challenging previous assumptions about gendered divisions of labor in ancient societies.