Abstract

The applications of whole-metagenome shotgun sequencing (WMGS) in routine clinical analysis are still limited. A combination of a DNA extraction procedure, sequencing, and bioinformatics tools is essential for the removal of human DNA and for improving bacterial species identification in a timely manner. We tackled these issues with a broncho-alveolar lavage (BAL) sample from an immunocompromised patient who had developed severe chronic pneumonia. We extracted DNA from the BAL sample with protocols based either on sequential lysis of human and bacterial cells or on the mechanical disruption of all cells. Metagenomic libraries were sequenced on Illumina HiSeq platforms. Microbial community composition was determined by k-mer analysis or by mapping to taxonomic markers. Results were compared to those obtained by conventional clinical culture and molecular methods. Compared to mechanical cell disruption, a sequential lysis protocol resulted in a significantly increased proportion of bacterial DNA over human DNA and higher sequence coverage of Mycobacterium abscessus, Corynebacterium jeikeium and Rothia dentocariosa, the bacteria reported by clinical microbiology tests. In addition, we identified anaerobic bacteria not searched for by the clinical laboratory. Our results further support the implementation of WMGS in clinical routine diagnosis for bacterial identification.

Conflict of interest statement

Figures

Figure 1

5

( A ) Schematic representation…

Figure 1

13

( A ) Schematic representation of metagenomic sequencing: BAL was split into two…

Figure 1

(A) Schematic representation of metagenomic sequencing: BAL was split into two aliquots which were independently treated with two DNA extraction protocols: Ultra-Deep Microbiome Prep (Molzym) and NucleoSpin Soil (MN). Metagenomic libraries were independently sequenced with 2 × 250 HiSeq 2500 and 2 × 100 Hiseq 2000, respectively; (B) Schematic representation of bioinformatics analyses: forward and reverse raw reads of Molzym- (HiSeq Molzym (2 × 250)) and MN-treated (HiSeq MN (2 × 100)) samples were used for taxonomic analysis with CLARK, Kraken and MetaPhlAn2. In addition, before taxonomic analyses, read pairs of the HiSeq Molzym (2 × 250) dataset were either quality-filtered and merged (HiSeq Molzym merged) or trimmed to the length of 100 nt (HiSeq Molzym trimmed to 2 × 100) as described in Materials and Methods; (C) Pie-charts representing the proportions (%) of sequencing reads classified as human (in blue), prokaryotic (red) or unclassified (grey) by CLARK (top row) and Kraken (bottom row). Proportions were computed over the total number of reads present in a given dataset. Viral- and fungal-classified read proportions are not shown and contributed to less 0.02% of total reads in all sequencing datasets.

Figure 2

5

Heat maps reporting the log2-transformed…

Figure 2

13

Heat maps reporting the log2-transformed relative abundance of the 15 most abundant bacterial…

Figure 2

Heat maps reporting the log2-transformed relative abundance of the 15 most abundant bacterial genera (A) and the 16 most abundant bacterial species (B) identified by CLARK and Kraken in all sequencing datasets. Relative abundance, expressed in percentage, is computed on the number of reads assigned to a given genus or species divided by the total number of sequencing reads present in a given dataset; (C) Pie charts representing the bacterial proportions of identified species as performed by CLARK (top) and Kraken (bottom). Bacterial proportions are defined as the percentage of reads mapped to a given bacterial species over the total number of reads assigned to prokaryotes; (D) Barplots showing relative abundance (%) of bacterial and viral taxa identified by MetaPhlAn2 in HiSeq Molzym, and HiSeq MN sequencing data. Percentage values are reported at the top of each bar; (E) Heat map representing the number of reads assigned to the most abundant fungi. Numbers indicate read counts instead of proportions.

Figure 3

5

( A ) Pie chart…

Figure 3

13

( A ) Pie chart representing the relative abundance of the antibiotic resistance-determinant…

Figure 3

(A) Pie chart representing the relative abundance of the antibiotic resistance-determinant (ARD) classes in the metagenome of the enriched BAL sample after quality filtering and merging of sequencing reads. Relative abundance is expressed as percentage of total reads mapped to the ARD ResFinder database (847); (B) Barplot reporting the number of reads mapping to the identified ARDs. Colors indicate whether the taxonomic read assignment of an ARD to a given taxon was performed with CLARK (blue violet), with Kraken (coral) or confirmed by both (magenta). Reads that mapped to cfxA and cfxA2 genes remained unclassified; (C) Coverage plot of reads mapping to erm(X) locus on the C. jeikeium K411 genome. From top to bottom: density plot of read depth versus genome position; chromosomal position of the erm(X) gene; the open reading frame is colored in sky blue; position of three main single-nucleotide polymorphisms (SNPs) detected; schematic representation of mapped reads grouped in stacks; red dots and lines on stacks represent a nucleotide variant in a given read sequence; amino acid substitution in the ERM(X) protein. The pink-colored area defines the chromosomal landmarks of the gene in the density plot and in the representation of read mapping; (D) Coverage plot of reads mapping to the erm(41)-MAB_2297 locus on the M. abscessus ATCC 19977 genome. For further explanation, see panel (C). No single nucleotide variants were detected.