Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell.

Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing reads. Recent advances in NGS technologies improved the speed and coverage and thus allowed for deep sequencing of bacterial populations. This facilitates the quantitative assessment of genomic heterogeneity, including low frequency alleles or haplotypes. However, false positive variant predictions due to sequencing errors and mapping artifacts of short reads need to be prevented. We therefore created VarCap, a workflow for the reliable prediction of different types of variants even at low frequencies. In order to predict SNPs, InDels and structural variations, we evaluated the sensitivity and accuracy of different software tools using synthetic read data. The results suggested that the best sensitivity could be reached by a union of different tools, however at the price of increased false positives. We identified possible reasons for false predictions and used this knowledge to improve the accuracy by post-filtering the predicted variants according to properties such as frequency, coverage, genomic environment/localization and co-localization with other variants. We observed that best precision was achieved by using an intersection of at least two tools per variant. This resulted in the reliable prediction of variants above a minimum relative abundance of 2%. VarCap is designed for being routinely used within experimental evolution experiments or for clinical diagnostics. The detected variants are reported as frequencies within a VCF file and as a graphical overview of the distribution of the different variant/allele/haplotype frequencies. The source code of VarCap is available at https://github.com/ma2o/VarCap. In order to provide this workflow to a broad community, we implemeted VarCap on a Galaxy webserver, which is accessible at http://galaxy.csb.univie.ac.at.

Methanogenic archaea are major contributors to the global carbon cycle and were long thought to belong exclusively to the euryarchaeal phylum. Discovery of the methanogenesis gene cluster methyl-coenzyme M reductase (Mcr) in the Bathyarchaeota, and thereafter the Verstraetearchaeota, led to a paradigm shift, pushing back the evolutionary origin of methanogenesis to predate that of the Euryarchaeota. The methylotrophic methanogenesis found in the non-Euryarchaota distinguished itself from the predominantly hydrogenotrophic methanogens found in euryarchaeal orders as the former do not couple methanogenesis to carbon fixation through the reductive acetyl-CoA [Wood-Ljungdahl pathway (WLP)], which was interpreted as evidence for independent evolution of the two methanogenesis pathways. Here, we report the discovery of a complete and divergent hydrogenotrophic methanogenesis pathway in a thermophilic order of the Verstraetearchaeota, which we have named Candidatus Methanohydrogenales, as well as the presence of the WLP in the crenarchaeal order Desulfurococcales. Our findings support the ancient origin of hydrogenotrophic methanogenesis, suggest that methylotrophic methanogenesis might be a later adaptation of specific orders, and provide insight into how the transition from hydrogenotrophic to methylotrophic methanogenesis might have occurred.

Mealybugs (Insecta: Hemiptera: Pseudococcidae) maintain obligatory relationships with bacterial symbionts, which provide essential nutrients to their insect hosts. Most pseudococcinae mealybugs harbor a unique symbiosis setup with enlarged betaproteobacterial symbionts ('Candidatus Tremblaya princeps'), which themselves contain gammaproteobacterial symbionts. Here we investigated the symbiosis of the manna mealybug, Trabutina mannipara, using a metagenomic approach. Phylogenetic analyses revealed that the intrabacterial symbiont of T. mannipara represents a novel lineage within the Gammaproteobacteria, for which we propose the tentative name 'Candidatus Trabutinella endobia'. Combining our results with previous data available for the nested symbiosis of the citrus mealybug Planococcus citri, we show that synthesis of essential amino acids and vitamins and translation-related functions partition between the symbiotic partners in a highly similar manner in the two systems, despite the distinct evolutionary origin of the intrabacterial symbionts. Bacterial genes found in both mealybug genomes and complementing missing functions in both symbioses were likely integrated in ancestral mealybugs before T. mannipara and P. citri diversified. The high level of correspondence between the two mealybug systems and their highly intertwined metabolic pathways are unprecedented. Our work contributes to a better understanding of the only known intracellular symbiosis between two bacteria and suggests that the evolution of this unique symbiosis included the replacement of intrabacterial symbionts in ancestral mealybugs.

The bacterial tree of life has recently undergone significant expansion, chiefly from candidate phyla retrieved through genome-resolved metagenomics. Bypassing the need for genome availability, we present a snapshot of bacterial phylogenetic diversity based on the recovery of high-quality SSU rRNA gene sequences extracted from nearly 7000 metagenomes and all available reference genomes. We illuminate taxonomic richness within established bacterial phyla together with environmental distribution patterns, providing a revised framework for future phylogeny-driven sequencing efforts.

Known giant virus diversity is currently skewed towards viruses isolated from aquatic environments and cultivated in the laboratory. Here, we employ cultivation-independent metagenomics and mini-metagenomics on soils from the Harvard Forest, leading to the discovery of 16 novel giant viruses, chiefly recovered by mini-metagenomics. The candidate viruses greatly expand phylogenetic diversity of known giant viruses and either represented novel lineages or are affiliated with klosneuviruses, Cafeteria roenbergensis virus or tupanviruses. One assembled genome with a size of 2.4 Mb represents the largest currently known viral genome in the Mimiviridae, and others encode up to 80% orphan genes. In addition, we find more than 240 major capsid proteins encoded on unbinned metagenome fragments, further indicating that giant viruses are underexplored in soil ecosystems. The fact that most of these novel viruses evaded detection in bulk metagenomes suggests that mini-metagenomics could be a valuable approach to unearth viral giants.

Chlamydiales bacterium STE3 and Neochlamydia sp. strain AcF84 are obligate intracellular symbionts of Acanthamoeba spp. isolated from the biofilm of a littoral cave wall and gills from striped tiger leaf fish, respectively. We report the draft genome sequences of these two environmental chlamydiae affiliated with the family Parachlamydiaceae.

Current knowledge about the nucleocytoplasmic large DNA viruses (NCLDV) is largely derived from viral isolates co-cultivated with protists and algae. Building on the rapidly increasing wealth of publicly available metagenome data, we reconstructed 2,074 NCLDV genomes from sampling sites spanning the globe. This led to an 11-fold increase in phylogenetic diversity and a parallel 10-fold expansion in functional diversity. Analysing 58,023 metagenomic major capsid proteins of large and giant viruses revealed global distribution patterns and underlined their cosmopolitan nature. The discovered viral genomes encoded a wide range of proteins with putative roles in photosynthesis and diverse substrate transport processes, revealing host reprogramming as a likely common strategy in the NCLDV. Furthermore, horizontal gene transfer inferences connected viral lineages to diverse eukaryotic hosts. We anticipate that the vast diversity of NCLDV revealed here on a global scale will establish giant viruses as key ecosystem players across Earth's biomes, associated with most major eukaryotic lineages.

BACKGROUND:Virophages are small viruses with double-stranded DNA genomes that replicate along with giant viruses and co-infect eukaryotic cells. Due to the paucity of virophage reference genomes, a collective understanding of the global virophage diversity, distribution, and evolution is lacking. RESULTS:Here we screened a public collection of over 14,000 metagenomes using the virophage-specific major capsid protein (MCP) as "bait." We identified 44,221 assembled virophage sequences, of which 328 represent high-quality (complete or near-complete) genomes from diverse habitats including the human gut, plant rhizosphere, and terrestrial subsurface. Comparative genomic analysis confirmed the presence of four core genes in a conserved block. We used these genes to establish a revised virophage classification including 27 clades with consistent genome length, gene content, and habitat distribution. Moreover, for eight high-quality virophage genomes, we computationally predicted putative eukaryotic virus hosts. CONCLUSION:Overall, our approach has increased the number of known virophage genomes by 10-fold and revealed patterns of genome evolution and global virophage distribution. We anticipate that the expanded diversity presented here will provide the backbone for further virophage studies.