Twitter demographics

Explanation of terms and methodology

Sources: Web of Science, CrossRef, Scopus, WebTrends, and Altmetric

Frequency of updates: In most cases, our metrics data is updated hourly.

Citations

Single number count for article citations from each service's database (may vary by service). The citations counts are reliant on the availability of the individual APIs from Web of Science, CrossRef, and Scopus. These counts are updated daily once they become available. Once a citation count is available, the list of articles citing this one is accessible by clicking on the circle for that citation source.

News, blogs and Google+ posts

The number of times an article has been cited by individual mainstream news sources, blog post, or member of Google+ along with a link to the original article or post. News articles, blog posts and Google+ posts do not always link to articles in a way that can be picked up by aggregators used by Altmetric, so the listed links are not necessarily a reflection of the entire scope of media, blog or Google+ interest. Further, the list of blogs and news sources covered is manually curated by Altmetric and thus is subject to their discretion for inclusion as a scientific blog or media source. The news, blog, and Google+ posts are provided by Altmetric and are updated hourly.

Altmetric score

Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention and the number in the centre is the Altmetric score. The score is calculated based on two main sources of online attention: social media and mainstream news media. Altmetric also tracks usage in online reference managers such as Mendeley and CiteULike, but these do not contribute to the score. Older articles will typically score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for articles of a "similar age" (published within 6 weeks of either side of the publication date of this article).

Search for this author in:

Microbiome sample collection and DNA extraction methods should be determined on a per-project basis, and metagenome sequencing can be conducted on the Illumina, PacBio, or another sequencing platform. Sequencing reads are trimmed based on quality scores (e.g. using SickleSickle18) and filtered for contamination (e.g. removal of human genome sequences). High-quality reads are then assembled (e.g. using IDBA_UDUD19), and the resulting scaffolds are binned either manually (e.g. based on GC content, taxonomic affiliation, coverage), and/or using a clustering algorithm such as ESOMESOM20,29,30) or using an automated binning program (e.g. MaxBinMaxBin21, CONCOCTCONCOCT22, or ABAWACAABAWACA15). Genome bins can then be assessed for completion and contamination based on inventory of expected single copy genes (SCGs), either based on identification of these genes from genome annotations (seesee15,29,55), or using software such as CheckMCheckM23. High-quality genomes are then compared with one another and grouped into clusters based on average nucleotide identity (ANI; e.g., based on sharing 98% ANI determined using MashMash54). A representative of each cluster should be included in a genome database that will be used for iRep analysis, along with genomes from other projects that may be appropriate for the analysis. Reads from each metagenome are then mapped to the genome database (e.g. using Bowtie2Bowtie247), and iRep is calculated from the read mapping data (see Online Methods).

(a) Gamma distribution used to simulate genome fragmentation for genome completeness analyses. The frequency of genome fragment sizes from all genomes analyzed in this study are compared with genome fragment sizes simulated using a gamma distribution with parameters: alpha = 0.1, beta = 21,000, min. = 5,000, max. = 200,000. These parameters were first estimated by fitting to the genome data, and then manually adjusted. Similarity between the two distributions shows that this gamma distribution can be used to approximate the level of genome fragmentation expected for draft-quality genome sequences. (b) iRep was calculated from random genome fragmentation simulations in order to survey a range of fragmentation levels (Supplementary Table 1). The analysis was conducted for an L. gasseri sample from the Korem et al.8 study in which iRep was determined to be 2.01 using the complete genome with 25x sequencing coverage. This known iRep value was then compared with iRep values determined from each genome fragmentation simulation after subsampling to 75% of the genome and using only 5x sequencing coverage. This enabled analysis of the influence of fragmentation on iRep calculations at the completeness and coverage limits of the method. Results show that 91.8% of iRep values are within the expected range of 0.15 when genomes have fewer than 175 fragments/Mbp of genome sequence. (c) Four L. gasseri samples from the Korem et al.8 study that represent iRep values between 1.50 and 2.01 were selected in order to test different coverage sliding window calculation methods (see Online Methods for description of each methods) and window sizes. For each sample, 100 random genome fragmentations and subsets were conducted in order to assess each method based on various levels of genome completion. The results show that the “iRep” and “median iRep” methods using 5 Kbp windows exhibited the least amount of variation. (d) Because the iRep method involves randomly combining coverage data from different genome fragments prior to calculating coverage sliding windows, some sliding windows will include coverage values from different locations on the complete genome sequence. In order to evaluate the variation introduced by the (random) order in which scaffolds are combined, iRep calculations were conducted for ten random orderings of 100 random genome fragmentations conducted using the sample set described in (c). Results show a very minimal amount of variation in iRep values as described by the difference between the lowest and highest values determined from each of the ten orderings (“iRep range”). Because of this, we chose not to implement the “median iRep” strategy. (e) Using the sample set described in (c), the iRep method was implemented using 5 Kbp windows using different window slide values in order to test whether or not the slide value would change the results. Because both 10 and 100 bp window slides produced similar results, we implemented the iRep method using a 100 bp window slide. (f) iRep is not as strongly correlated with bPTR without the GC sequencing bias correction for five genome sequences assembled from premature infant metagenomes (Supplementary Table 4; compare with GC corrected data in Fig. 2e).

(a-e) Read mapping was conducted using sequences from the sample used for genome recovery. bPTR was calculated after determining the origin and terminus of replication based on cumulative GC skew. Coverage was calculated for 10 Kbp windows calculated every 100 bp (extremely low and high coverage windows were filtered out; see Online Methods). bPTR was calculated as the ratio between the coverage at the origin and terminus after applying a median filter. Cumulative GC skew and coverage patterns confirm the ordering of genome fragments.

Reads were mapped to both reconstructed genomes and closely related reference genomes (Supplementary Table 4), and the percent of each genome covered by sequencing reads is reported. Average nucleotide identity (ANI) is reported between each reconstructed genome and the paired reference genome. The large fractions of reference genomes not represented by metagenome sequencing show that extensive genomic variation is present between surveyed and reference genomes, despite high ANI values in some cases.

(a-e) Reads from the adult human microbiome were mapped to complete reference genome sequences. Coverage was calculated for 10 Kbp windows every 100 bp (extremely low and high coverage windows were filtered out; see Online Methods). The origin and terminus of replication were determined based on coverage. bPTR was calculated as the ratio between the coverage at the origin and terminus after applying a median filter. Cumulative GC skew and coverage patterns suggest the presence of genomic variation or assembly errors for some genomes (b-c, e).

The five days following antibiotic administration are indicated using a color gradient (DOL = day of life). Half of the infants in the study developed necrotizing enterocolitis (NEC; dotted red lines) during the study period.