Abstract

To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, approximately 150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively.

Relative abundance of frequent microbial genomes among individuals of the cohort

Boxes denote 25% and 75% percentiles, the black line in the box corresponds to the median, the “whiskers” indicate the interquartile range from either or both ends of the box, the dots show the outliers, beyond the ends of the whiskers (See for computation).

Principal component analysis based on the abundance of 155 species with ≥1% genome coverage by the Illumina reads in at least 1 individual of the cohort was carried out with 14 healthy individuals and 25 IBD patients from Spain.

The clusters were ranked by the number of genes they contain, normalized by average length and copy number (see Supplementary Fig. 10) and the proportion of clusters with the essential B. subtilis genes was determined for successive groups of 100 clusters. Range indicates the part of the cluster distribution that contains 86 % of the B. subtilis essential genes.

a, Projection of the minimal gut genome on the KEGG pathways using the Ipath tool. b, Functional composition of the minimal gut genome and metagenome. c, Estimation of the minimal gut metagenome size. Known orthologous groups (OGs; red), known+unknown OGs (blue) and OGs+novel gene families (>20 proteins; grey). Inset: Composition of the gut minimal microbiome. Large circle: Classification in the minimal metagenome according to OG occurrence in STRING7 bacterial genomes. Common (25%), uncommon (35%) and rare (45%) are present in >50%, <50% but >10% and <10% of genomes, respectively. Small circle: composition of the rare OGs. Unknown (80%) have no annotation or are poorly characterized, while known bacterial (19%) and phage-related (1%) OGs have functional description.