Abstract

Background

Microbial communities present in environmental waters constitute a reservoir for antibiotic-resistant pathogens that impact human health. For this reason, a diverse variety of water environments are being analyzed using metagenomics to uncover public health threats. However, the composition of these communities along the coastal environment of a whole city, where sewage and beach waters are mixed, is poorly understood.

Results

We shotgun-sequenced 20 coastal areas from the city of Montevideo (capital of Uruguay) including beach and sewage water samples to characterize bacterial communities and their virulence and antibiotic resistance repertories. As expected, we found that sewage and beach environments present significantly different bacterial communities. This baseline allowed us to detect a higher prevalence and a more diverse repertory of virulence and antibiotic-resistant genes in sewage samples. Many of these genes come from well-known enterobacteria and represent carbapenemases and extended-spectrum betalactamases reported in hospital infections in Montevideo. Additionally, we were able to genotype the presence of both globally disseminated pathogenic clones and emerging antibiotic-resistant bacteria in sewage waters.

Conclusions

Our study represents the first in using metagenomics to jointly analyze beaches and the sewage system from an entire city, allowing us to characterize antibiotic-resistant pathogens circulating in urban waters. The data generated in this initial study represent a baseline metagenomic exploration to guide future longitudinal (time-wise) studies, whose systematic implementation will provide useful epidemiological information to improve public health surveillance.

Introduction

Human activity shapes the microbial communities residing in urban environments. In particular, urban sewage systems are designed to evacuate human wastes from the houses to areas of low human exposure and gradually reinstate them into natural watercourses such as creeks, beaches, or the sea. This cycle is of tremendous importance for public health as waste waters can be a reservoir and vehicle for the transmission of pathogenic bacteria and antibiotic resistance mechanisms. Indeed, the rapid emergence and spread of pathogenic bacteria with extensive antibiotic resistance has been recognized by the World Health Organization as a top health issue [1], since water can easily move microorganisms between humans and other animal species. Accordingly, the analysis of environmental waters is being adopted as an effective method to monitor the dynamics of antibiotic-resistant pathogens [2], as this kind of environments can play a role as important as clinical settings for the selection of antibiotic resistance [3].

Recent advances in high-throughput sequencing (HTS) and computational biology now allow the exploration of microbial communities based on culture-independent approaches using metagenomics. This enables us to quantify and functionally characterize environmental microbiomes with unprecedented precision and comprehensiveness [4]. Indeed, the very recent implementation of this methodology to explore the microbial diversity in the urban environment is providing a completely new layer of information to be integrated in the management of cities, potentially assisting decisions that range from urban design to public health [5, 6]. In particular, urban sewage or beach water systems have been previously characterized using metagenomics not only aiming to uncover ecological patterns [7] but also to characterize pathogenic and antibiotic-resistant bacteria [8, 9]. However, the joint analysis of bacterial communities present at the same time in the sewage and beach waters from the same metropolitan area remains to be explored in depth.

Sewage waters have been shown to accurately reflect the population’s gut microbiota composition [10], raising the possibility of using metagenomics to directly gain information about infection dynamics [11]. Additionally, beaches not only are important for recreational use but also are frequently recognized as risky environments for the contagion and transmission of bacterial infections [12], particularly if they are constantly or sporadically impacted by sewage spillovers. Accordingly, we performed a cross-sectional shotgun metagenomic analysis along the urban coast of Montevideo, the capital of Uruguay, aiming to characterize bacterial communities present in the sewage and beach water due to the important role of these environments for human health. Our study is centered in the assessment of antibiotic-resistant bacteria, representing the first of that kind in a South American city and the kickoff towards the incorporation of metagenomics in the surveillance of microbiological risks at city scale.

Results

Composition of sewage and beach communities

First, we explored the structure of microbial communities present in our beach and sewage samples using a multiset k-mer counting approach. This strategy provides an unbiased view that is not affected by taxonomic or functional assignment; conversely, it just evaluates the differential abundance of unique DNA segments [13]. Figure 1a shows a clustering analysis based on this methodology that shows a complete discrimination between sewage and beach samples, suggesting substantial differences in the composition of communities in these environments. Second, we confirmed the observed discrimination from a taxonomic point of view by calculating relative abundances of bacterial species present in each sample using an approach based on the identification and quantification of marker genes [14]. Figure 1b shows a dendrogram based on beta diversities (between samples) calculated using the Bray-Curtis dissimilarity distance from the taxonomic profiles, showing a complete discrimination between beach and sewage. Beta diversity (Bray-Curtis dissimilarity) was 0.42 within sewage samples (SD = 0.23) and 0.41 within beach samples (SD = 0.22) but increased to 0.63 (SD = 0.12) when comparing sewage against beach samples. Alpha diversity (within samples) was calculated using the Shannon index and averaged 3.65 (SD = 0.64) for sewage and 3.71 (SD = 0.42) for beach samples (Additional file 1: Figure S1). As expected, these results indicate that taxonomic composition of bacterial communities from these environments are substantially different and can discriminate between beach or sewage origin (geographic location of each sample along the coast of Montevideo is displayed in Fig. 1c). This provided us the baseline to compare their repertories of virulence and antibiotic resistance genes.

Fig. 1

Community composition of beach and sewage waters of Montevideo. a Heatmap showing a clustering analysis based on k-mer distances evidencing a complete separation between sewage (red) and beach (blue) samples. b Clustering analysis separating sewage (red) from beach (blue) samples obtained by comparing beta diversities (dissimilarity between samples) calculated from relative abundance profiles of bacterial species. c Sampling points along the coast of Montevideo (gray shade). Sewage water samples are in red and beach water samples are in blue

Occurrence of antibiotic resistance genes (ARGs)

Due to the impact of fecal contamination in sewage water, we hypothesized that these samples were more rich and diverse in antibiotic resistance mechanisms than beach water. To test this, metagenomic assemblies were screened against the Comprehensive Antibiotic Resistance Database (CARD) [15], because it is currently the most up-to-date and manually curated resource for ARGs detection. We found that 108 out of 2177 (~ 5%) ARGs had hits in our samples and they belong to 10 different antibiotic classes (Additional file 2: Table S2). The clinically relevant TEM-4 and TEM-33 betalactamases were the most frequent genes and aminoglycoside-modifying enzymes (like acetyltransferases or phosphotransferases) the most diversely represented class of ARGs. In particular, a significant difference (p = 0.002, Mann-Whitney U test) in ARG occurrence and a significantly higher diversity (p = 0.0024, Mann-Whitney U test) of ARGs according to the Simpson’s index were found in favor of sewage compared to beach samples (Additional file 1: Figure S2). Furthermore, when inspecting antibiotic classes, we observed that sewage samples encoded ARGs belonging to 90% of antibiotic classes found in these environments while beach samples only encoded 40% of antibiotic classes, evidencing a more complex composition of antibiotic resistance mechanisms in the urban sewage waters (Fig. 2a). Indeed, only elfamycin resistance genes were present in beach but absent in sewage samples. On the other side, the occurrence of ARGs for aminoglycosides, betalactams, tetracyclines, sulfonamides, macrolides, and streptogramins was significantly greater (p < 0.01, Mann-Whitney U test) in sewage compared to beach samples (Fig. 2b).

Fig. 2

Occurrence of antibiotic resistance mechanisms. a Circos representation showing the presence of ARGs across sewage (red) or beach (blue) samples. Links are drawn when a certain ARG (right blocks) is found in a certain sample (left blocks). Genes are colored according to antibiotic classes. Barplots above each left side block indicate the alpha diversity of ARGs within each sample. b Boxplots showing the number of detected ARGs for different antibiotic classes in beach (blue) and sewage (red) samples

Distribution of ARGs in mobile elements

As a general trend, we found that ARGs present in our samples were more prevalent in plasmids than in bacterial chromosomes. Specifically, ARGs for sulfonamides, betalactams, aminoglycosides, phenicols, macrolides and streptogramins were more prevalent in plasmids than in bacterial chromosomes (Fig. 3a). Additionally, these plasmids carrying ARGs are extensively distributed among many clinically relevant enterobacteria like E. coli, Salmonella, Klebsiella, Enterobacter, Citrobacter, and Acinetobacter, among others (Fig. 3b). ARGs for tetracyclines, lincosamides, fluoroquinolones, and elfamycins were more frequently encoded in chromosomes.

We also looked for integrons and found a higher prevalence of them in sewage (74%) than in beach (24%) samples. Furthermore, clinically relevant integron classes 1, 2, and 3 were almost exclusively found in sewage samples (~ 90%) (Fig. 3c). Also, we were able to identify cassette ARGs with conserved attC sites associated to 24 out of 39 integrons (61%). These cassette genes mostly coded for multidrug efflux pumps, but also we found carbapenemases (OXA family), GES extended-spectrum betalactamases (ESBLs), aadA1 aminoglycoside nucleotidyltransferases, cat chloramphenicol acetyltransferases, and aac6-Ib amikacin resistance genes. These cassette genes were exclusively found in sewage integrons.

Occurrence of virulence factors (VFs)

To complement the characterization of ARGs, we also screened metagenomic assemblies against the Virulence Factor Database (VFdb) [16]. Ninety-nine out of 451 (~ 22%) VFs were detected in our samples. Specifically, VFs were found in 7 out of 8 (87.5%) sewage samples and 4 out of 12 (33%) beach samples. Sewage samples also presented a higher occurrence and diversity of VFs compared to beach samples (Additional file 1: Figure S3A). The functional classification of these VFs showed that those involved in bacterial motility, cell adherence, iron uptake, and secretion were predominant among sewage samples. Interestingly, we found the presence of the ShET2 enterotoxin (senB) in a single sewage sample. We also examined the taxonomic distribution of these VFs and found that they are predominantly distributed in Pseudomonas, Salmonella, Escherichia, Yersinia, and Shigella, among others (Additional file 1: Figure S3B). These results highlight the importance of urban waters as a reservoir and vehicle for VFs responsible of determining well-known pathogenic mechanisms in clinically relevant bacteria.

Taxa carrying mobile ARGs and VFs are sewage biomarkers

We also aimed to identify bacterial taxa associated to sewage or beach that can explain the observed differences in the composition of the overall bacterial community and their ARGs and VFs repertories. Then, we applied a linear discriminant analysis (LDA) and effect size estimation [17] to determine statistically significant taxa associated to beach or sewage. We summarized the results at the genus level and found 6 genera associated to the beach environment while 47 genera were characteristic of the sewage environment (Fig. 4a). Interestingly, 10 out of 47 (~ 21%) sewage-associated genera comprise bacterial species that are well-known human pathogens, including Aeromonas, Acinetobacter, Arcobacter, Citrobacter, Enterobacter, Klebsiella, Pseudomonas, Sphingobacterium, Stenotrophomonas and Streptococcus (Fig. 4b). Most of these taxa match with those found carrying mobile ARGs and VFs.

Identification of pathogenic genotypes

To gain further insight on the microbiological risks in these environments, we attempted to resolve the presence of previously reported pathogens at the strain level using metagenome-derived multilocus sequence typing (MLST) [18, 19]. Despite the fact that this method was unable to identify complete previously reported sequence types (STs), the detection of partial allele combinations allowed us to infer the presence of several genotypes of clinical importance. In sewage samples, we found three alleles of Citrobacter freundii whose combination determines the ST-209, which has been detected from isolates recovered from diarrheal patients [20]. Also, we detected three alleles from Escherichia coli defining the clonal complex ST-131, which is a globally disseminated multidrug-resistant clone associated with human extraintestinal (urinary and bloodstream) infections [21]. Furthermore, the detection of two alleles of Arcobacter cryaerophilus defined the presence of ST-392, a recently characterized genotype causative of persistent diarrhea [22]. We also identified several Salmonella alleles that diverged from known genotypes, preventing the inference of putative STs. Many unknown alleles of Pseudomonas fluorescens were also identified both in sewage and beach samples. This species is mainly associated to food spoilage but has been sporadically reported as an opportunistic human pathogen causing systemic infections associated to the consumption of animal byproducts [23]. Overall, these results indicate the presence of pathogenic genotypes in urban waters of Montevideo.

Discussion

Pharmaceutical products, including antibiotics, can be only partially metabolized by humans, so these compounds or their derived metabolites are excreted [24] lastly reaching the environment. Sewage pipes have been largely considered as passive water transport systems, but recent studies uncovered hotspots of microbial diversity and activity in these urban environments. Indeed, the sewage system is the first step of city wastewater cycle and thus the most likely place where excreted antibiotic residues can induce the emergence of antibiotic-resistant bacteria [25]. This is particularly relevant since recent studies have demonstrated that microbial communities present in the sewage recapitulate those found in the human gut microbiome [10]. Consequently, the exposure of human-derived bacteria to environmental pressures facilitates the emergence and spread of antibiotic-resistant pathogens that can be transmitted and impact population’s health [26].

The baseline characterization of urban waters in our capital city revealed an expected distinct taxonomic composition of bacterial communities found in sewage and beach environments, suggesting that hazardous bacteria present in sewage waters are rapidly diluted when mixed with environmental freshwater. Indeed, the vast majority of virulence and antibiotic resistance mechanisms associated to clinically relevant pathogens were found in the sewage but not in beach samples. However, a more dense and longitudinal sampling is necessary to further characterize the dynamics of hazardous microorganisms circulating in these environments. Also, the comparison with metagenomes from hospital effluents along the city would provide a detailed picture of how nosocomial pathogens are being dispersed through the environment.

Indeed, many clinically relevant ARGs that we found in the city environment, such as carbapenemases and ESBLs, have been frequently reported in nosocomial infections in Uruguay during the last decade [27,28,29,30]. This indicates that important antibiotic-resistant pathogens are being somehow transmitted among clinical settings and the urban environment, representing a public health threat. However, other ARGs such as metallo-β-lactamases that have also been reported in Uruguay [31,32,33,34] were not found in the sewage or beaches. Beyond technical biases that are possible, this can be attributed to a differential capacity between distinct antibiotic-resistant clones to survive and spread in the environment; given that selective fitness of antibiotic-resistant pathogens (adapted to high antibiotic pressures in hospital settings) may be lower in less-exposed environmental waters [35].

In this sense, despite the urban environment not being directly exposed to similar concentrations of antibiotics than those used to treat infections, sewage systems have been recognized as ARG reservoirs. So, considering that we found most environmental ARGs typically widespread in enterobacterial plasmids and that clinically relevant integrons were fundamentally recovered from sewage samples, genetic platforms for horizontal gene transfer can be playing a relevant role as a reservoir of ARGs. Additionally, plasmids and integrons are prone to recombination [36] and genetic plasticity of certain bacteria has been proved to increase under subinhibitory pressures (as those probably found in the environment) with certain antibiotics [37]. So, the city sewage should be also considered as a birthplace for new antibiotic resistance mosaics mediated by recombination and horizontal gene transfer.

Regarding this, we were able to identify multidrug-resistant genotypes with great capacity of recruiting new resistance genes like the internationally disseminated E. coli ST-131 clonal complex. Also, we uncovered the presence of clinically underestimated bacteria like Arcobacter cryaerophilus, which is today considered an emerging waterborne pathogen and whose resistance to third-generation cephalosporins has been already reported [38]. So, the compositional complexity of urban waters where different genotypes and gene repertories can coexist within a fluctuating bacterial community opens the possibility of using environmental samples to monitor population’s health. However, the detection of environmental DNA by metagenomics does not necessarily indicate the presence of viable bacteria, so the extrapolation to antibiotic-resistant phenotypes from biomarker sequences should be validated with traditional microbiology approaches. For this reason, we are also creating the Bacterial Biobank of the Urban Environment (BBUE), which consists in phenotyped and whole-genome sequenced isolates obtained from the same sampling points that we performed metagenomics [39]. Accordingly, the integration of these complementary data will provide the framework to support the possibility of using high-resolution metagenomics to study the epidemiological dynamics of antibiotic-resistant pathogens using urban waters as a proxy at the population level.

Conclusion

Our study represents a cross-sectional analysis of a metropolitan area encompassing more than 2.2 million inhabitants and, to the best of our knowledge, constitutes the first work using metagenomics to jointly characterize bacterial communities found in the sewage and beach waters of an entire city.

Our approach demonstrated its usefulness to identify antibiotic resistance determinants which were known to be present in nosocomial infections, as well as to uncover the presence of globally-widespread or underestimated pathogens with strain-level resolution. Future longitudinal studies (time-wise) will be useful to monitor the fluctuations of bacterial communities, allowing the development of associative models with relevant metadata like outbreak information, rainfall, or antibiotic prescription and stewardship.

The data generated in this initial study represent a baseline metagenomic characterization of environmental waters of Montevideo, which will be useful to guide future efforts to implement systematic studies aiming to evaluate antibiotic-resistant pathogen dynamics through time and space across different cities. This information can be later incorporated to improve public health surveillance for antibiotic-resistant pathogens.

Methods

Sample collection

The coastal city of Montevideo spans over a 20-km long system of sandy beaches. The municipal sewage system of the city connects over 90% of houses in the urban area, whose wastes are finally pumped to a single treatment facility placed in the peninsula of Punta Carretas (34°56′15.3″ S 56°09′36.4″ W) which filters out most macroscopic particles before delivering wastewater 5 km away inside the estuary Rio de la Plata. However, the western region of the city is still not connected to this system and sewage pipes pour wastewater directly in the shore. We collected 20 water samples from 12 beaches and 8 sewage pipes. Beach samples are distributed along the whole coastal line of Montevideo, while sewage samples represent those points in the western region where wastewater is directly poured into the shore (Fig. 1c, Additional file 3: Table S1). The samples were collected all on the same day (around 3 h elapsed from the first to the last). All samples were collected using sterile 200 mL plastic bottles and preserved in ice until they were processed within the same day.

DNA purification, metagenomic sequencing and quality analysis

Each sample was centrifuged at 10,000×g for 15 min at 4 °C. The supernatants we discarded and pellets were processed using the FastDNA™ Spin Kit (MP Biomedicals) following the manufacturer’s protocol. Extraction controls were included but as they resulted negative in a PCR using 16S universal primers, they were not included for subsequent shotgun metagenomic sequencing. Paired end (2 × 125 bp) sequencing reads were generated on the Illumina HiSeq2500 machine, yielding in average 39.2 M (SD ± 5.4 M) reads per sample. Initial data quality inspection was performed with FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc) and then reads were filtered and trimmed using Trimmomatic [40] with the following parameters: LEADING:20, TRAILING:20, SLIDINGWINDOW:5:20, AVGQUAL:20, and MINLEN:90. On average, 30.85 M (SD ± 4.07 M) reads passed this filter and were used in all further analyses. Sequencing data was deposited at the Sequence Read Archive (SRA) repository under BioProject number PRJNA515946.

Metagenomic data analysis

First, an unbiased description of the variability among communities in sewage and beach was obtained by running Simka [13] with default parameters. This method builds k-mer abundance vectors from raw reads that are subsequently used to calculate classical ecological distances to compare between samples. Second, MetaPhlan2 [14] was used to identify species and to determine their relative abundances across samples. Beta diversities were calculated over taxonomic profiles using the Bray-Curtis distance as implemented in the Vegan R package [41], and alpha diversities were calculated using the Shannon index in the base R package [42]. The identification of environment-associated biomarker taxa was performed with LEfSe, an approach that calculates effect sizes using a linear discriminant analysis from relative abundances (of taxa in this case) [17]. Characterization of bacterial pathogens at the strain level was performed with metaMLST [19], which tries to identify multilocus sequence typing alleles directly from metagenomic sequences by mapping reads against reference genes.

Metagenomes were de novo assembled for each sample with MEGAHIT [43]. Then, contigs over 1 kb were retained and merged at 99% of identity using CD-HIT-EST [44]. Resulting contigs were secondary assembled using Minimus2 [45] requiring a minimum overlap of 100 bp with at least 95% of identity at contig boundaries. A mean of 19,065 (SD = 6500) contigs longer than 1000 bp were obtained per sample. Mean N50 for these contigs was 2051 bp (SD = 336) and mean L50 was 5111 (SD = 1996). Additional assembly statistics can be found at Additional file 3: Table S1. Genes were predicted on the resulting contigs using Prodigal [46]. Antibiotic resistance and virulence genes were identified with Abricate (https://github.com/tseemann/abricate) by comparing contigs against CARD [15] and VFdb [16], respectively. Only hits with query coverage > 90% and sequence identity > 70% were kept.

The occurrence of ARGs in chromosomes or plasmids was determined using BLAST+ blastx [47] against 10,393 complete plasmid sequences from the NCBI RefSeq database (updated on 20 September 2017), and against the representative set of 2064 closed bacterial chromosomes from the PATRIC database [48]. Hits with > 70% of query coverage and > 70% of amino acidic identity were kept. Taxonomic classification was extracted from the description header of both plasmids and chromosomes.

Integrons were identified using IntegronFinder [49]. We used MAFFT [50] (with the L-INS-i option) to align the amino acid sequences of IntI genes recovered from metagenomes together with reference sequences of class 1 (IntI1, AAQ16665.1), class 2 (IntI2, AAT72891.1), class 3 (IntI3, AAO32355.1), class 4 (IntI4, AAC 38424), and class 5 (IntI5, AAD 55407.2) integrases. The resulting alignment was used to build a phylogenetic tree with RAxML [51]. Scripts used to process metagenomic data are freely available at http://github.com/giraola/metagenomics_montevideo.

Acknowledgments

We thank technicians from Intendencia de Montevideo for valuable assistance during sampling and data collection. We thank Hugo Naya for providing lab space and computational resources at the Bioinformatics Unit in the Institut Pasteur Montevideo during the development of this work.

Funding

This work was funded by the Agencia Nacional de Investigación e Innovación (ANII) grant number OPR_X_2016_1_1006944.

Availability of data and materials

All data generated during this study is available at the Sequence Read Archive (SRA) under BioProject number PRJNA515946.

Search for Ebrahim Afshinnekoo in:

Search for Christopher Mason in:

Search for Gastón H. Gonnet in:

Search for Gregorio Iraola in:

Contributions

GI and GG conceived the idea. BD, EA, and CM generated the data. PF, VA, CS, MG, and GI analyzed and interpreted the data. PF, VA, CS, GG, and GI wrote the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.