Reconstructing bacterial and archaeal genomes has revolutionized our understanding of microbial metabolism as well as evolutionary processes, and significantly sped up discoveries made in bioprospecting. The vast majority of bacterial and archaeal genomes sequenced to date are of rather limited phylogenetic diversity as they were largely chosen as based on their physiology and/ or medical importance. These findings gave rise to the initiation of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) Project (Wu et al 2009), which populated the tree of life with phylogenetically diverse reference genomes.Following into the footsteps of the GEBA project, we initiated the Microbial Dark Matter (MDM) Project in 2011 to massively expand genomic representation further by targeting 200 representatives of candidate phyla (phyla proposed on the basis of environmental sequences that have no, or very few cultivated representatives) via high throughout single-cell sequencing. The insight derived from the ~200 MDM genomes belonging to 28 major uncharted branches of the tree were invaluable (Rinke at al 2013), providing a first glimpse into the coding potential and the phylogeny of candidate phyla. The single-cell data enabled us to resolve numerous intra- and inter-phylum level relationships and propose new superphyla. In addition we proposed names for 18 candidate phyla for which we greatly expanded sequence space. Here we propose phase II of the MDM project to further deepen our understanding of dark matter by targeting 1000 genomes from candidate phyla. Habitats of highest phylogenetic diversity (PD) value, as based on SSU rRNA surveys, are selected for single-cell sorting and shotgun metagenomics. Thousands to tens of thousands of single cells will be anonymously sorted, whole genomes will be amplified and screened by SSU rRNA PCR and sequencing to allow their taxonomic placement and selection for the draft genome sequencing. We propose to only target taxonomic groups within candidate phyla with no or few sequenced representatives to maximize phylogenetic coverage, the selection of which will in part be driven by the outcome of the single-cell sorts.We moreover propose to generate metagenome shotgun data for each sampling site undergoing single-cell sorting. Single cell genomes will be assembled and QC?d by single-cell specific pipelines previously developed at the JGI. Genomes will be assembled from metagenomes by algorithms developed in the Hugenholtz laboratory. The resulting genome sequences of candidate phyla representatives generated in this project will then be used for metabolic reconstruction to assess functional diversity and link these to the habitat metadata. Genomes will be mined for unique and unexpected functions. Population genomes will be used for pangenome analysis of natural populations to assess heterogeneity and deepen our insights into population structure. Lastly, including our genomes as part of the reference database, we aim to carry out large-scale phylogenetic anchoring analysis of all public metagenomes to study co-occurrence patterns in shotgun sequenced habitats worldwide. Based on phylogenetic diversity analysis of existing SSU rRNA data, ecosystem DOE-relevance and sample access and availability, we selected a target list of 12 environments. All samples harbour a high percentage of candidate phyla. Two of these environments (Sakinaw Lake and Etoliko lagoon) were part of our previous work and SAGs for sequencing as well as metagenomic sequence data are already in hand. For the additional target environments, various Co-PIs will be responsible for providing sample for both, single-cell sequencing (cryo-preserved frozen stocks) and metgenome sequencing (extracted DNA) according to Bigelow and JGI instructions. Samples will be processed at the DOE JGI and with the assistance of the Woyke lab.