Bottom Line:
Here we use the pyro-sequencing of PCR amplicons derived from both nonribosomal peptide adenylation domains and polyketide ketosynthase domains to compare biosynthetic diversity in soil microbiomes from around the globe.We see large differences in domain populations from all except the most proximal and biome-similar samples, suggesting that most microbiomes will encode largely distinct collections of bacterial secondary metabolites.Our data indicate a correlation between two factors, geographic distance and biome-type, and the biosynthetic diversity found in soil environments.

ABSTRACTRecent bacterial (meta)genome sequencing efforts suggest the existence of an enormous untapped reservoir of natural-product-encoding biosynthetic gene clusters in the environment. Here we use the pyro-sequencing of PCR amplicons derived from both nonribosomal peptide adenylation domains and polyketide ketosynthase domains to compare biosynthetic diversity in soil microbiomes from around the globe. We see large differences in domain populations from all except the most proximal and biome-similar samples, suggesting that most microbiomes will encode largely distinct collections of bacterial secondary metabolites. Our data indicate a correlation between two factors, geographic distance and biome-type, and the biosynthetic diversity found in soil environments. By assigning reads to known gene clusters we identify hotspots of biomedically relevant biosynthetic diversity. These observations not only provide new insights into the natural world, they also provide a road map for guiding future natural products discovery efforts.

fig2: Biomedically relevant natural product hotspots and diversity.Hotspot analysis of natural product biosynthetic diversity to identify sampleswith a high total proportion of reads corresponding to a natural product familyof interest (A and D), the maximum unique OTUscorresponding to a natural product family of interest (B andD), or the estimated sample biodiversity (C andD). In A and B samples are arrangedby longitude and hemisphere as is shown in the Sample Key. (A) Foreach sample, sequence reads assigned by eSNaPD are expressed as a percentage oftotal reads obtained for that sample. A sample is designated a hotspot if morethan one percent (0.01; horizontal line) of its reads map to a specific genecluster. Fractional observance data for five representative gene clusters orgene cluster families (zorbamycin, oocydin, tiacumicinB, epoxomicin,glycopeptides) that show significant sample dependent difference in readfrequency are shown. (B) Hotspots of elevated gene cluster familydiversity can be identified by determining the number of unique OTUs occurringin each sample that, by eSNaPD, map to a natural product gene cluster ofinterest. Sample specific OTU counts for nocardicin, rifamycin, bleomycin, anddaptomycin clusters are shown. Samples containing greater than 50% of themaximum observed OTU value are colored and mapped in (C). OTUdiversity measurements do not predict the abundance of a specific cluster in ametagenome [as predicted in (A)], but instead are used to identifylocations where the largest number of congener-encoding clusters may be found.These sites are predicted to be most useful for increasing the structuraldiversity and therefore potential clinical utility of these medically importantfamilies of natural products. (C) Estimated diversity of AD/KSreads by sample. AD and KS OTU tables were combined and for each sample theChao1 diversity metric was calculated at 5000 reads, providing a baselinemetric for comparing sample biosynthetic diversity. The average number ofunique OTUs observed over 10 rarefactions analyses is shown (also see Supplementary file7). (D) Hotspot map of samples identified inA, B and C. (E)Representative structures of target molecule families highlighted inA and B.DOI:http://dx.doi.org/10.7554/eLife.05048.004

Mentions:
Interestingly, eSNaPD analysis of the data from all sites reveals two distinct types ofbiomedically relevant natural product gene cluster ‘hot spots’ within ourdata (Figure 2A,B,D). These include‘specific gene cluster hotspots’ and ‘gene cluster familyhotspots’. Metagenomes from ‘specific gene cluster hotspots’ arepredicted to be enriched for a gene cluster that encodes a congener of the targetnatural product, while metagenomes from ‘gene cluster family hotspots’ arepredicted to encode multiple congeners related to the target natural product. Figure 2A shows several of the strongest examples of‘specific gene cluster hotspots’ where reads falling into an OTU relatedto a specific biomedically relevant gene cluster or gene cluster family aredisproportionately represented in the sequence data from individual microbiomes. Theseexamples highlight the different enrichment patterns that we observe in theenvironment—hotspots are either local in nature, consisting of only one or twosamples containing sequence reads mapping to the target (epoxomycin, oocydin); regional(tiacumicinB); or global with punctuated increases in diversity (glycopeptides). Wewould predict ‘specific gene cluster hotspots’ (Figure 2D) are naturally enriched for bacteria that encodecongeners of the biomedically relevant target metabolites, thereby potentiallysimplifying the discovery of new congeners. Figure2B shows examples of ‘gene cluster family hotspots’, wheremetagenomes having a disproportionately high number of OTUs mapping to a specificbiomedically relevant target molecule family (e.g., nocardicin, rifamycin, bleomycin,and daptomycin families are shown) are highlighted. This analysis identifies specificsample sites, from among those surveyed, that are predicted to contain the most diversecollection of gene clusters associated with a target molecule of interest (Figure 2B). Both types of hotspots should representproductive starting points for future natural product discovery efforts aimed atexpanding the structural diversity and potential utility of specific biomedicallyrelevant natural product families.10.7554/eLife.05048.004Figure 2.Biomedically relevant natural product hotspots and diversity.

fig2: Biomedically relevant natural product hotspots and diversity.Hotspot analysis of natural product biosynthetic diversity to identify sampleswith a high total proportion of reads corresponding to a natural product familyof interest (A and D), the maximum unique OTUscorresponding to a natural product family of interest (B andD), or the estimated sample biodiversity (C andD). In A and B samples are arrangedby longitude and hemisphere as is shown in the Sample Key. (A) Foreach sample, sequence reads assigned by eSNaPD are expressed as a percentage oftotal reads obtained for that sample. A sample is designated a hotspot if morethan one percent (0.01; horizontal line) of its reads map to a specific genecluster. Fractional observance data for five representative gene clusters orgene cluster families (zorbamycin, oocydin, tiacumicinB, epoxomicin,glycopeptides) that show significant sample dependent difference in readfrequency are shown. (B) Hotspots of elevated gene cluster familydiversity can be identified by determining the number of unique OTUs occurringin each sample that, by eSNaPD, map to a natural product gene cluster ofinterest. Sample specific OTU counts for nocardicin, rifamycin, bleomycin, anddaptomycin clusters are shown. Samples containing greater than 50% of themaximum observed OTU value are colored and mapped in (C). OTUdiversity measurements do not predict the abundance of a specific cluster in ametagenome [as predicted in (A)], but instead are used to identifylocations where the largest number of congener-encoding clusters may be found.These sites are predicted to be most useful for increasing the structuraldiversity and therefore potential clinical utility of these medically importantfamilies of natural products. (C) Estimated diversity of AD/KSreads by sample. AD and KS OTU tables were combined and for each sample theChao1 diversity metric was calculated at 5000 reads, providing a baselinemetric for comparing sample biosynthetic diversity. The average number ofunique OTUs observed over 10 rarefactions analyses is shown (also see Supplementary file7). (D) Hotspot map of samples identified inA, B and C. (E)Representative structures of target molecule families highlighted inA and B.DOI:http://dx.doi.org/10.7554/eLife.05048.004

Mentions:
Interestingly, eSNaPD analysis of the data from all sites reveals two distinct types ofbiomedically relevant natural product gene cluster ‘hot spots’ within ourdata (Figure 2A,B,D). These include‘specific gene cluster hotspots’ and ‘gene cluster familyhotspots’. Metagenomes from ‘specific gene cluster hotspots’ arepredicted to be enriched for a gene cluster that encodes a congener of the targetnatural product, while metagenomes from ‘gene cluster family hotspots’ arepredicted to encode multiple congeners related to the target natural product. Figure 2A shows several of the strongest examples of‘specific gene cluster hotspots’ where reads falling into an OTU relatedto a specific biomedically relevant gene cluster or gene cluster family aredisproportionately represented in the sequence data from individual microbiomes. Theseexamples highlight the different enrichment patterns that we observe in theenvironment—hotspots are either local in nature, consisting of only one or twosamples containing sequence reads mapping to the target (epoxomycin, oocydin); regional(tiacumicinB); or global with punctuated increases in diversity (glycopeptides). Wewould predict ‘specific gene cluster hotspots’ (Figure 2D) are naturally enriched for bacteria that encodecongeners of the biomedically relevant target metabolites, thereby potentiallysimplifying the discovery of new congeners. Figure2B shows examples of ‘gene cluster family hotspots’, wheremetagenomes having a disproportionately high number of OTUs mapping to a specificbiomedically relevant target molecule family (e.g., nocardicin, rifamycin, bleomycin,and daptomycin families are shown) are highlighted. This analysis identifies specificsample sites, from among those surveyed, that are predicted to contain the most diversecollection of gene clusters associated with a target molecule of interest (Figure 2B). Both types of hotspots should representproductive starting points for future natural product discovery efforts aimed atexpanding the structural diversity and potential utility of specific biomedicallyrelevant natural product families.10.7554/eLife.05048.004Figure 2.Biomedically relevant natural product hotspots and diversity.

Bottom Line:
Here we use the pyro-sequencing of PCR amplicons derived from both nonribosomal peptide adenylation domains and polyketide ketosynthase domains to compare biosynthetic diversity in soil microbiomes from around the globe.We see large differences in domain populations from all except the most proximal and biome-similar samples, suggesting that most microbiomes will encode largely distinct collections of bacterial secondary metabolites.Our data indicate a correlation between two factors, geographic distance and biome-type, and the biosynthetic diversity found in soil environments.

ABSTRACTRecent bacterial (meta)genome sequencing efforts suggest the existence of an enormous untapped reservoir of natural-product-encoding biosynthetic gene clusters in the environment. Here we use the pyro-sequencing of PCR amplicons derived from both nonribosomal peptide adenylation domains and polyketide ketosynthase domains to compare biosynthetic diversity in soil microbiomes from around the globe. We see large differences in domain populations from all except the most proximal and biome-similar samples, suggesting that most microbiomes will encode largely distinct collections of bacterial secondary metabolites. Our data indicate a correlation between two factors, geographic distance and biome-type, and the biosynthetic diversity found in soil environments. By assigning reads to known gene clusters we identify hotspots of biomedically relevant biosynthetic diversity. These observations not only provide new insights into the natural world, they also provide a road map for guiding future natural products discovery efforts.