Long bibliographies are displayed in blocks of 100 citations at a
time. At the end of each block there is an option to load the next
block.

Bibliography on: Pangenome

Robert J. Robbins is a biologist, an educator, a science administrator, a
publisher, an information technologist, and an IT leader and manager who
specializes in advancing biomedical knowledge and supporting education
through the application of information technology.
More About: RJR
|
OUR TEAM
|
OUR SERVICES
|
THIS WEBSITE

RJR: Recommended Bibliography
20 Jan 2019 at 01:31Created:

Pangenome

Although the enforced stability of genomic content is ubiquitous among
MCEs, the opposite is proving to be the case among prokaryotes, which
exhibit remarkable and adaptive plasticity of genomic content. Early
bacterial whole-genome sequencing efforts discovered that whenever a
particular "species" was re-sequenced, new genes were found that had
not been detected earlier — entirely new genes, not merely new alleles.
This led to the concepts of the bacterial core-genome, the set of
genes found in all members of a particular "species", and the
flex-genome, the set of genes found in some, but not all members of
the "species". Together these make up the species' pan-genome.

Created with PubMed® Query:
pangenome or "pan-genome" or "pan genome" NOT pmcbook NOT ispreviousversion

Citations
The Papers
(from PubMed®)

RevDate: 2019-01-19

Zhu D, He J, Yang Z, et al (2019)

Comparative analysis reveals the Genomic Islands in Pasteurella multocida population genetics: on Symbiosis and adaptability.

BMC genomics, 20(1):63 pii:10.1186/s12864-018-5366-6.

BACKGROUND: Pasteurella multocida (P. multocida) is a widespread opportunistic pathogen that infects human and various animals. Genomic Islands (GIs) are one of the most important mobile components that quickly help bacteria acquire large fragments of foreign genes. However, the effects of GIs on P. multocida are unknown in the evolution of bacterial populations.

RESULTS: Ten avian-sourced P. multocida obtained through high-throughput sequencing together with 104 publicly available P. multocida genomes were used to analyse their population genetics, thus constructed a pan-genome containing 3948 protein-coding genes. Through the pan-genome, the open evolutionary pattern of P. multocida was revealed, and the functional components of 944 core genes, 2439 accessory genes and 565 unique genes were analysed. In addition, a total of 280 GIs were predicted in all strains. Combined with the pan-genome of P. multocida, the GIs accounted for 5.8% of the core genes in the pan-genome, mainly related to functional metabolic activities; the accessory genes accounted for 42.3%, mainly for the enrichment of adaptive genes; and the unique genes accounted for 35.4%, containing some defence mechanism-related genes.

CONCLUSIONS: The effects of GIs on the population genetics of P. multocida evolution and adaptation to the environment are reflected by the proportion and function of the pan-genome acquired from GIs, and the large quantities of GI data will aid in additional population genetics studies.

@article {pmid30658579,
year = {2019},
author = {Zhu, D and He, J and Yang, Z and Wang, M and Jia, R and Chen, S and Liu, M and Zhao, X and Yang, Q and Wu, Y and Zhang, S and Liu, Y and Zhang, L and Yu, Y and You, Y and Chen, X and Cheng, A},
title = {Comparative analysis reveals the Genomic Islands in Pasteurella multocida population genetics: on Symbiosis and adaptability.},
journal = {BMC genomics},
volume = {20},
number = {1},
pages = {63},
doi = {10.1186/s12864-018-5366-6},
pmid = {30658579},
issn = {1471-2164},
support = {No. 2017YFD050080//the National Key Research and Development Program of China/ ; No. CARS-42-17//China Agricultural Research System/ ; No. 2016JPT0004//Special Fund for Key Laboratory of Animal Disease and Human Health of Sichuan Province/ ; CARS-SVDIP//Sichuan Veterinary Medicine and Drug Innovation Group of China Agricultural Research System/ ; },
abstract = {BACKGROUND: Pasteurella multocida (P. multocida) is a widespread opportunistic pathogen that infects human and various animals. Genomic Islands (GIs) are one of the most important mobile components that quickly help bacteria acquire large fragments of foreign genes. However, the effects of GIs on P. multocida are unknown in the evolution of bacterial populations.

RESULTS: Ten avian-sourced P. multocida obtained through high-throughput sequencing together with 104 publicly available P. multocida genomes were used to analyse their population genetics, thus constructed a pan-genome containing 3948 protein-coding genes. Through the pan-genome, the open evolutionary pattern of P. multocida was revealed, and the functional components of 944 core genes, 2439 accessory genes and 565 unique genes were analysed. In addition, a total of 280 GIs were predicted in all strains. Combined with the pan-genome of P. multocida, the GIs accounted for 5.8% of the core genes in the pan-genome, mainly related to functional metabolic activities; the accessory genes accounted for 42.3%, mainly for the enrichment of adaptive genes; and the unique genes accounted for 35.4%, containing some defence mechanism-related genes.

CONCLUSIONS: The effects of GIs on the population genetics of P. multocida evolution and adaptation to the environment are reflected by the proportion and function of the pan-genome acquired from GIs, and the large quantities of GI data will aid in additional population genetics studies.},
}

RevDate: 2019-01-16

Sherman RM, Forman J, Antonescu V, et al (2019)

Author Correction: Assembly of a pan-genome from deep sequencing of 910 humans of African descent.

In the version of this article initially published, the statement "there are no pan-genomes for any other animal or plant species" was incorrect. The statement has been corrected to "there are no reported pan-genomes for any other animal species, to our knowledge." We thank David Edwards for bringing this error to our attention. The error has been corrected in the HTML and PDF versions of the article.

@article {pmid30647471,
year = {2019},
author = {Sherman, RM and Forman, J and Antonescu, V and Puiu, D and Daya, M and Rafaels, N and Boorgula, MP and Chavan, S and Vergara, C and Ortega, VE and Levin, AM and Eng, C and Yazdanbakhsh, M and Wilson, JG and Marrugo, J and Lange, LA and Williams, LK and Watson, H and Ware, LB and Olopade, CO and Olopade, O and Oliveira, RR and Ober, C and Nicolae, DL and Meyers, DA and Mayorga, A and Knight-Madden, J and Hartert, T and Hansel, NN and Foreman, MG and Ford, JG and Faruque, MU and Dunston, GM and Caraballo, L and Burchard, EG and Bleecker, ER and Araujo, MI and Herrera-Paz, EF and Campbell, M and Foster, C and Taub, MA and Beaty, TH and Ruczinski, I and Mathias, RA and Barnes, KC and Salzberg, SL},
title = {Author Correction: Assembly of a pan-genome from deep sequencing of 910 humans of African descent.},
journal = {Nature genetics},
volume = {},
number = {},
pages = {},
doi = {10.1038/s41588-018-0335-1},
pmid = {30647471},
issn = {1546-1718},
abstract = {In the version of this article initially published, the statement "there are no pan-genomes for any other animal or plant species" was incorrect. The statement has been corrected to "there are no reported pan-genomes for any other animal species, to our knowledge." We thank David Edwards for bringing this error to our attention. The error has been corrected in the HTML and PDF versions of the article.},
}

RevDate: 2019-01-14

Blaustein RA, McFarland AG, Ben Maamar S, et al (2019)

Pangenomic Approach To Understanding Microbial Adaptations within a Model Built Environment, the International Space Station, Relative to Human Hosts and Soil.

mSystems, 4(1): pii:mSystems00281-18.

Understanding underlying mechanisms involved in microbial persistence in the built environment (BE) is essential for strategically mitigating potential health risks. To test the hypothesis that BEs impose selective pressures resulting in characteristic adaptive responses, we performed a pangenomics meta-analysis leveraging 189 genomes (accessed from GenBank) of two epidemiologically important taxa, Bacillus cereus and Staphylococcus aureus, isolated from various origins: the International Space Station (ISS; a model BE), Earth-based BEs, soil, and humans. Our objectives were to (i) identify differences in the pangenomic composition of generalist and host-associated organisms, (ii) characterize genes and functions involved in BE-associated selection, and (iii) identify genomic signatures of ISS-derived strains of potential relevance for astronaut health. The pangenome of B. cereus was more expansive than that of S. aureus, which had a dominant core component. Genomic contents of both taxa significantly correlated with isolate origin, demonstrating an importance for biogeography and potential niche adaptations. ISS/BE-enriched functions were often involved in biosynthesis, catabolism, materials transport, metabolism, and stress response. Multiple origin-enriched functions also overlapped across taxa, suggesting conserved adaptive processes. We further characterized two mobile genetic elements with local neighborhood genes encoding biosynthesis and stress response functions that distinctively associated with B. cereus from the ISS. Although antibiotic resistance genes were present in ISS/BE isolates, they were also common in counterparts elsewhere. Overall, despite differences in microbial lifestyle, some functions appear common to remaining viable in the BE, and those functions are not typically associated with direct impacts on human health. IMPORTANCE The built environment contains a variety of microorganisms, some of which pose critical human health risks (e.g., hospital-acquired infection, antibiotic resistance dissemination). We uncovered a combination of complex biological functions that may play a role in bacterial survival under the presumed selective pressures in a model built environment-the International Space Station-by using an approach to compare pangenomes of bacterial strains from two clinically relevant species (B. cereus and S. aureus) isolated from both built environments and humans. Our findings suggest that the most crucial bacterial functions involved in this potential adaptive response are specific to bacterial lifestyle and do not appear to have direct impacts on human health.

@article {pmid30637341,
year = {2019},
author = {Blaustein, RA and McFarland, AG and Ben Maamar, S and Lopez, A and Castro-Wallace, S and Hartmann, EM},
title = {Pangenomic Approach To Understanding Microbial Adaptations within a Model Built Environment, the International Space Station, Relative to Human Hosts and Soil.},
journal = {mSystems},
volume = {4},
number = {1},
pages = {},
doi = {10.1128/mSystems.00281-18},
pmid = {30637341},
issn = {2379-5077},
abstract = {Understanding underlying mechanisms involved in microbial persistence in the built environment (BE) is essential for strategically mitigating potential health risks. To test the hypothesis that BEs impose selective pressures resulting in characteristic adaptive responses, we performed a pangenomics meta-analysis leveraging 189 genomes (accessed from GenBank) of two epidemiologically important taxa, Bacillus cereus and Staphylococcus aureus, isolated from various origins: the International Space Station (ISS; a model BE), Earth-based BEs, soil, and humans. Our objectives were to (i) identify differences in the pangenomic composition of generalist and host-associated organisms, (ii) characterize genes and functions involved in BE-associated selection, and (iii) identify genomic signatures of ISS-derived strains of potential relevance for astronaut health. The pangenome of B. cereus was more expansive than that of S. aureus, which had a dominant core component. Genomic contents of both taxa significantly correlated with isolate origin, demonstrating an importance for biogeography and potential niche adaptations. ISS/BE-enriched functions were often involved in biosynthesis, catabolism, materials transport, metabolism, and stress response. Multiple origin-enriched functions also overlapped across taxa, suggesting conserved adaptive processes. We further characterized two mobile genetic elements with local neighborhood genes encoding biosynthesis and stress response functions that distinctively associated with B. cereus from the ISS. Although antibiotic resistance genes were present in ISS/BE isolates, they were also common in counterparts elsewhere. Overall, despite differences in microbial lifestyle, some functions appear common to remaining viable in the BE, and those functions are not typically associated with direct impacts on human health. IMPORTANCE The built environment contains a variety of microorganisms, some of which pose critical human health risks (e.g., hospital-acquired infection, antibiotic resistance dissemination). We uncovered a combination of complex biological functions that may play a role in bacterial survival under the presumed selective pressures in a model built environment-the International Space Station-by using an approach to compare pangenomes of bacterial strains from two clinically relevant species (B. cereus and S. aureus) isolated from both built environments and humans. Our findings suggest that the most crucial bacterial functions involved in this potential adaptive response are specific to bacterial lifestyle and do not appear to have direct impacts on human health.},
}

RevDate: 2019-01-11

Abreo E, N Altier (2019)

Pangenome of Serratia marcescens strains from nosocomial and environmental origins reveals different populations and the links between them.

Scientific reports, 9(1):46 pii:10.1038/s41598-018-37118-0.

Serratia marcescens is a Gram-negative bacterial species that can be found in a wide range of environments like soil, water and plant surfaces, while it is also known as an opportunistic human pathogen in hospitals and as a plant growth promoting bacteria (PGPR) in crops. We have used a pangenome-based approach, based on publicly available genomes, to apply whole genome multilocus sequence type schemes to assess whether there is an association between source and genotype, aiming at differentiating between isolates from nosocomial sources and the environment, and between strains reported as PGPR from other environmental strains. Most genomes from a nosocomial setting and environmental origin could be assigned to the proposed nosocomial or environmental MLSTs, which is indicative of an association between source and genotype. The fact that a few genomes from a nosocomial source showed an environmental MLST suggests that a minority of nosocomial strains have recently derived from the environment. PGPR strains were assigned to different environmental types and clades but only one clade comprised strains accumulating a low number of known virulence and antibiotic resistance determinants and was exclusively from environmental sources. This clade is envisaged as a group of promissory MLSTs for selecting prospective PGPR strains.

@article {pmid30631083,
year = {2019},
author = {Abreo, E and Altier, N},
title = {Pangenome of Serratia marcescens strains from nosocomial and environmental origins reveals different populations and the links between them.},
journal = {Scientific reports},
volume = {9},
number = {1},
pages = {46},
doi = {10.1038/s41598-018-37118-0},
pmid = {30631083},
issn = {2045-2322},
abstract = {Serratia marcescens is a Gram-negative bacterial species that can be found in a wide range of environments like soil, water and plant surfaces, while it is also known as an opportunistic human pathogen in hospitals and as a plant growth promoting bacteria (PGPR) in crops. We have used a pangenome-based approach, based on publicly available genomes, to apply whole genome multilocus sequence type schemes to assess whether there is an association between source and genotype, aiming at differentiating between isolates from nosocomial sources and the environment, and between strains reported as PGPR from other environmental strains. Most genomes from a nosocomial setting and environmental origin could be assigned to the proposed nosocomial or environmental MLSTs, which is indicative of an association between source and genotype. The fact that a few genomes from a nosocomial source showed an environmental MLST suggests that a minority of nosocomial strains have recently derived from the environment. PGPR strains were assigned to different environmental types and clades but only one clade comprised strains accumulating a low number of known virulence and antibiotic resistance determinants and was exclusively from environmental sources. This clade is envisaged as a group of promissory MLSTs for selecting prospective PGPR strains.},
}

Brucellosis is a zoonotic infectious disease caused by bacteria of the genus Brucella. Brucella melitensis, Brucella abortus, and Brucella suis are the most pathogenic species of this genus causing the majority of human and domestic animal brucellosis. There is a need to develop a safe and potent subunit vaccine to overcome the serious drawbacks of the live attenuated Brucella vaccines. The aim of this work was to discover antigen candidates conserved among the three pathogenic species. In this study, we employed a reverse vaccinology strategy to compute the core proteome of 90 completed genomes: 55 B. melitensis, 17 B. abortus, and 18 B. suis. The core proteome was analyzed by a metasubcellular localization prediction pipeline to identify surface-associated proteins. The identified proteins were thoroughly analyzed using various in silico tools to obtain the most potential protective antigens. The number of core proteins obtained from analyzing the 90 proteomes was 1939 proteins. The surface-associated proteins were 177. The number of potential antigens was 87; those with adhesion score ≥ 0.5 were considered antigen with "high potential," while those with a score of 0.4-0.5 were considered antigens with "intermediate potential." According to a cumulative score derived from protein antigenicity, density of MHC-I and MHC-II epitopes, MHC allele coverage, and B-cell epitope density scores, a final list of 34 potential antigens was obtained. Remarkably, most of the 34 proteins are associated with bacterial adhesion, invasion, evasion, and adaptation to the hostile intracellular environment of macrophages which is adjusted to deprive Brucella of required nutrients. Our results provide a manageable list of potential protective antigens for developing a potent vaccine against brucellosis. Moreover, our elaborated analysis can provide further insights into novel Brucella virulence factors. Our next step is to test some of these antigens using an appropriate antigen delivery system.

@article {pmid30622973,
year = {2018},
author = {Hisham, Y and Ashhab, Y},
title = {Identification of Cross-Protective Potential Antigens against Pathogenic Brucella spp. through Combining Pan-Genome Analysis with Reverse Vaccinology.},
journal = {Journal of immunology research},
volume = {2018},
number = {},
pages = {1474517},
doi = {10.1155/2018/1474517},
pmid = {30622973},
issn = {2314-7156},
abstract = {Brucellosis is a zoonotic infectious disease caused by bacteria of the genus Brucella. Brucella melitensis, Brucella abortus, and Brucella suis are the most pathogenic species of this genus causing the majority of human and domestic animal brucellosis. There is a need to develop a safe and potent subunit vaccine to overcome the serious drawbacks of the live attenuated Brucella vaccines. The aim of this work was to discover antigen candidates conserved among the three pathogenic species. In this study, we employed a reverse vaccinology strategy to compute the core proteome of 90 completed genomes: 55 B. melitensis, 17 B. abortus, and 18 B. suis. The core proteome was analyzed by a metasubcellular localization prediction pipeline to identify surface-associated proteins. The identified proteins were thoroughly analyzed using various in silico tools to obtain the most potential protective antigens. The number of core proteins obtained from analyzing the 90 proteomes was 1939 proteins. The surface-associated proteins were 177. The number of potential antigens was 87; those with adhesion score ≥ 0.5 were considered antigen with "high potential," while those with a score of 0.4-0.5 were considered antigens with "intermediate potential." According to a cumulative score derived from protein antigenicity, density of MHC-I and MHC-II epitopes, MHC allele coverage, and B-cell epitope density scores, a final list of 34 potential antigens was obtained. Remarkably, most of the 34 proteins are associated with bacterial adhesion, invasion, evasion, and adaptation to the hostile intracellular environment of macrophages which is adjusted to deprive Brucella of required nutrients. Our results provide a manageable list of potential protective antigens for developing a potent vaccine against brucellosis. Moreover, our elaborated analysis can provide further insights into novel Brucella virulence factors. Our next step is to test some of these antigens using an appropriate antigen delivery system.},
}

Corallococcus is an abundant genus of predatory soil myxobacteria, containing two species, C. coralloides (for which a genome sequence is available) and C. exiguus. To investigate the genomic basis of predation, we genome-sequenced 23 Corallococcus strains. Genomic similarity metrics grouped the sequenced strains into at least nine distinct genomospecies, divided between two major sub-divisions of the genus, encompassing previously described diversity. The Corallococcus pan-genome was found to be open, with strains exhibiting highly individual gene sets. On average, only 30.5% of each strain's gene set belonged to the core pan-genome, while more than 75% of the accessory pan-genome genes were present in less than four of the 24 genomes. The Corallococcus accessory pan-proteome was enriched for the COG functional category "Secondary metabolism," with each genome containing on average 55 biosynthetic gene clusters (BGCs), of which only 20 belonged to the core pan-genome. Predatory activity was assayed against ten prey microbes and found to be mostly incongruent with phylogeny or BGC complement. Thus, predation seems multifactorial, depending partially on BGC complement, but also on the accessory pan-genome - genes most likely acquired horizontally. These observations encourage further exploration of Corallococcus as a source for novel bioactive secondary metabolites and predatory proteins.

@article {pmid30619233,
year = {2018},
author = {Livingstone, PG and Morphew, RM and Whitworth, DE},
title = {Genome Sequencing and Pan-Genome Analysis of 23 Corallococcus spp. Strains Reveal Unexpected Diversity, With Particular Plasticity of Predatory Gene Sets.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {3187},
doi = {10.3389/fmicb.2018.03187},
pmid = {30619233},
issn = {1664-302X},
abstract = {Corallococcus is an abundant genus of predatory soil myxobacteria, containing two species, C. coralloides (for which a genome sequence is available) and C. exiguus. To investigate the genomic basis of predation, we genome-sequenced 23 Corallococcus strains. Genomic similarity metrics grouped the sequenced strains into at least nine distinct genomospecies, divided between two major sub-divisions of the genus, encompassing previously described diversity. The Corallococcus pan-genome was found to be open, with strains exhibiting highly individual gene sets. On average, only 30.5% of each strain's gene set belonged to the core pan-genome, while more than 75% of the accessory pan-genome genes were present in less than four of the 24 genomes. The Corallococcus accessory pan-proteome was enriched for the COG functional category "Secondary metabolism," with each genome containing on average 55 biosynthetic gene clusters (BGCs), of which only 20 belonged to the core pan-genome. Predatory activity was assayed against ten prey microbes and found to be mostly incongruent with phylogeny or BGC complement. Thus, predation seems multifactorial, depending partially on BGC complement, but also on the accessory pan-genome - genes most likely acquired horizontally. These observations encourage further exploration of Corallococcus as a source for novel bioactive secondary metabolites and predatory proteins.},
}

RevDate: 2019-01-08

Wu Y, Zaiden N, B Cao (2018)

The Core- and Pan-Genomic Analyses of the Genus Comamonas: From Environmental Adaptation to Potential Virulence.

Frontiers in microbiology, 9:3096.

Comamonas is often reported to be one of the major members of microbial communities in various natural and engineered environments. Versatile catabolic capabilities of Comamonas have been studied extensively in the last decade. In contrast, little is known about the ecological roles and adaptation of Comamonas to different environments as well as the virulence of potentially pathogenic Comamonas strains. In this study, we provide genomic insights into the potential ecological roles and virulence of Comamonas by analysing the entire gene set (pangenome) and the genes present in all genomes (core genome) using 34 genomes of 11 different Comamonas species. The analyses revealed that the metabolic pathways enabling Comamonas to acquire energy from various nutrient sources are well conserved. Genes for denitrification and ammonification are abundant in Comamonas, suggesting that Comamonas plays an important role in the nitrogen biogeochemical cycle. They also encode sophisticated redox sensory systems and diverse c-di-GMP controlling systems, allowing them to be able to effectively adjust their biofilm lifestyle to changing environments. The virulence factors in Comamonas were found to be highly species-specific. The conserved strategies used by potentially pathogenic Comamonas for surface adherence, motility control, nutrient acquisition and stress tolerance were also revealed.

@article {pmid30619175,
year = {2018},
author = {Wu, Y and Zaiden, N and Cao, B},
title = {The Core- and Pan-Genomic Analyses of the Genus Comamonas: From Environmental Adaptation to Potential Virulence.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {3096},
doi = {10.3389/fmicb.2018.03096},
pmid = {30619175},
issn = {1664-302X},
abstract = {Comamonas is often reported to be one of the major members of microbial communities in various natural and engineered environments. Versatile catabolic capabilities of Comamonas have been studied extensively in the last decade. In contrast, little is known about the ecological roles and adaptation of Comamonas to different environments as well as the virulence of potentially pathogenic Comamonas strains. In this study, we provide genomic insights into the potential ecological roles and virulence of Comamonas by analysing the entire gene set (pangenome) and the genes present in all genomes (core genome) using 34 genomes of 11 different Comamonas species. The analyses revealed that the metabolic pathways enabling Comamonas to acquire energy from various nutrient sources are well conserved. Genes for denitrification and ammonification are abundant in Comamonas, suggesting that Comamonas plays an important role in the nitrogen biogeochemical cycle. They also encode sophisticated redox sensory systems and diverse c-di-GMP controlling systems, allowing them to be able to effectively adjust their biofilm lifestyle to changing environments. The virulence factors in Comamonas were found to be highly species-specific. The conserved strategies used by potentially pathogenic Comamonas for surface adherence, motility control, nutrient acquisition and stress tolerance were also revealed.},
}

Chromids aid genome expansion and functional diversification in the family Burkholderiaceae.

Molecular biology and evolution pii:5273485 [Epub ahead of print].

Multipartite genomes, containing at least two large replicons, are found in diverse bacteria; however, the advantage of this genome structure remains incompletely understood. Here, we perform comparative genomics of hundreds of finished β-proteobacterial genomes to gain insights into the role and emergence of multipartite genomes. Nearly all essential secondary replicons (chromids) of the β-proteobacteria are found in the family Burkholderiaceae. These replicons arose from just two plasmid acquisition events, and they were likely stabilized early in their evolution by the presence of core genes. On average, Burkholderiaceae genera with multipartite genomes had a larger total genome size, but smaller chromosome, than genera without secondary replicons. Pangenome-level functional enrichment analyses suggested that inter-replicon functional biases are partially driven by the enrichment of secondary replicons in the accessory pangenome fraction. Nevertheless, the small overlap in orthologous groups present in each replicon's pangenome indicates a clear functional separation of the replicons. Chromids appeared biased to environmental adaptation, as the functional categories enriched on chromids were also over-represented on the chromosomes of the environmental genera (Paraburkholderia, Cupriavidus) compared to the pathogenic genera (Burkholderia, Ralstonia). Using ancestral state reconstruction, it was predicted that the rate of accumulation of modern-day genes by chromids was more rapid than the rate of gene accumulation by the chromosomes. Overall, the data are consistent with a model where the primary advantage of secondary replicons is in facilitating increased rates of gene acquisition through horizontal gene transfer, consequently resulting in a replicon enriched in genes associated with adaptation to novel environments.

@article {pmid30608550,
year = {2019},
author = {diCenzo, GC and Mengoni, A and Perrin, E},
title = {Chromids aid genome expansion and functional diversification in the family Burkholderiaceae.},
journal = {Molecular biology and evolution},
volume = {},
number = {},
pages = {},
doi = {10.1093/molbev/msy248},
pmid = {30608550},
issn = {1537-1719},
abstract = {Multipartite genomes, containing at least two large replicons, are found in diverse bacteria; however, the advantage of this genome structure remains incompletely understood. Here, we perform comparative genomics of hundreds of finished β-proteobacterial genomes to gain insights into the role and emergence of multipartite genomes. Nearly all essential secondary replicons (chromids) of the β-proteobacteria are found in the family Burkholderiaceae. These replicons arose from just two plasmid acquisition events, and they were likely stabilized early in their evolution by the presence of core genes. On average, Burkholderiaceae genera with multipartite genomes had a larger total genome size, but smaller chromosome, than genera without secondary replicons. Pangenome-level functional enrichment analyses suggested that inter-replicon functional biases are partially driven by the enrichment of secondary replicons in the accessory pangenome fraction. Nevertheless, the small overlap in orthologous groups present in each replicon's pangenome indicates a clear functional separation of the replicons. Chromids appeared biased to environmental adaptation, as the functional categories enriched on chromids were also over-represented on the chromosomes of the environmental genera (Paraburkholderia, Cupriavidus) compared to the pathogenic genera (Burkholderia, Ralstonia). Using ancestral state reconstruction, it was predicted that the rate of accumulation of modern-day genes by chromids was more rapid than the rate of gene accumulation by the chromosomes. Overall, the data are consistent with a model where the primary advantage of secondary replicons is in facilitating increased rates of gene acquisition through horizontal gene transfer, consequently resulting in a replicon enriched in genes associated with adaptation to novel environments.},
}

RevDate: 2019-01-04

Dillon MM, Thakur S, Almeida RND, et al (2019)

Recombination of ecologically and evolutionarily significant loci maintains genetic cohesion in the Pseudomonas syringae species complex.

Genome biology, 20(1):3 pii:10.1186/s13059-018-1606-y.

BACKGROUND: Pseudomonas syringae is a highly diverse bacterial species complex capable of causing a wide range of serious diseases on numerous agronomically important crops. We examine the evolutionary relationships of 391 agricultural and environmental strains using whole-genome sequencing and evolutionary genomic analyses.

RESULTS: We describe the phylogenetic distribution of all 77,728 orthologous gene families in the pan-genome, reconstruct the core genome phylogeny using the 2410 core genes, hierarchically cluster the accessory genome, identify the diversity and distribution of type III secretion systems and their effectors, predict ecologically and evolutionary relevant loci, and establish the molecular evolutionary processes operating on gene families. Phylogenetic and recombination analyses reveals that the species complex is subdivided into primary and secondary phylogroups, with the former primarily comprised of agricultural isolates, including all of the well-studied P. syringae strains. In contrast, the secondary phylogroups include numerous environmental isolates. These phylogroups also have levels of genetic diversity typically found among distinct species. An analysis of rates of recombination within and between phylogroups revealed a higher rate of recombination within primary phylogroups than between primary and secondary phylogroups. We also find that "ecologically significant" virulence-associated loci and "evolutionarily significant" loci under positive selection are over-represented among loci that undergo inter-phylogroup genetic exchange.

CONCLUSIONS: While inter-phylogroup recombination occurs relatively rarely, it is an important force maintaining the genetic cohesion of the species complex, particularly among primary phylogroup strains. This level of genetic cohesion, and the shared plant-associated niche, argues for considering the primary phylogroups as a single biological species.

RESULTS: We describe the phylogenetic distribution of all 77,728 orthologous gene families in the pan-genome, reconstruct the core genome phylogeny using the 2410 core genes, hierarchically cluster the accessory genome, identify the diversity and distribution of type III secretion systems and their effectors, predict ecologically and evolutionary relevant loci, and establish the molecular evolutionary processes operating on gene families. Phylogenetic and recombination analyses reveals that the species complex is subdivided into primary and secondary phylogroups, with the former primarily comprised of agricultural isolates, including all of the well-studied P. syringae strains. In contrast, the secondary phylogroups include numerous environmental isolates. These phylogroups also have levels of genetic diversity typically found among distinct species. An analysis of rates of recombination within and between phylogroups revealed a higher rate of recombination within primary phylogroups than between primary and secondary phylogroups. We also find that "ecologically significant" virulence-associated loci and "evolutionarily significant" loci under positive selection are over-represented among loci that undergo inter-phylogroup genetic exchange.

CONCLUSIONS: While inter-phylogroup recombination occurs relatively rarely, it is an important force maintaining the genetic cohesion of the species complex, particularly among primary phylogroup strains. This level of genetic cohesion, and the shared plant-associated niche, argues for considering the primary phylogroups as a single biological species.},
}

Domesticated plants and animals often display dramatic responses to selection, but the origins of the genetic diversity underlying these responses remain poorly understood. Despite domestication and improvement bottlenecks, the cultivated sunflower remains highly variable genetically, possibly due to hybridization with wild relatives. To characterize genetic diversity in the sunflower and to quantify contributions from wild relatives, we sequenced 287 cultivated lines, 17 Native American landraces and 189 wild accessions representing 11 compatible wild species. Cultivar sequences failing to map to the sunflower reference were assembled de novo for each genotype to determine the gene repertoire, or 'pan-genome', of the cultivated sunflower. Assembled genes were then compared to the wild species to estimate origins. Results indicate that the cultivated sunflower pan-genome comprises 61,205 genes, of which 27% vary across genotypes. Approximately 10% of the cultivated sunflower pan-genome is derived through introgression from wild sunflower species, and 1.5% of genes originated solely through introgression. Gene ontology functional analyses further indicate that genes associated with biotic resistance are over-represented among introgressed regions, an observation consistent with breeding records. Analyses of allelic variation associated with downy mildew resistance provide an example in which such introgressions have contributed to resistance to a globally challenging disease.

@article {pmid30598532,
year = {2018},
author = {Hübner, S and Bercovich, N and Todesco, M and Mandel, JR and Odenheimer, J and Ziegler, E and Lee, JS and Baute, GJ and Owens, GL and Grassa, CJ and Ebert, DP and Ostevik, KL and Moyers, BT and Yakimowski, S and Masalia, RR and Gao, L and Ćalić, I and Bowers, JE and Kane, NC and Swanevelder, DZH and Kubach, T and Muños, S and Langlade, NB and Burke, JM and Rieseberg, LH},
title = {Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance.},
journal = {Nature plants},
volume = {},
number = {},
pages = {},
doi = {10.1038/s41477-018-0329-0},
pmid = {30598532},
issn = {2055-0278},
abstract = {Domesticated plants and animals often display dramatic responses to selection, but the origins of the genetic diversity underlying these responses remain poorly understood. Despite domestication and improvement bottlenecks, the cultivated sunflower remains highly variable genetically, possibly due to hybridization with wild relatives. To characterize genetic diversity in the sunflower and to quantify contributions from wild relatives, we sequenced 287 cultivated lines, 17 Native American landraces and 189 wild accessions representing 11 compatible wild species. Cultivar sequences failing to map to the sunflower reference were assembled de novo for each genotype to determine the gene repertoire, or 'pan-genome', of the cultivated sunflower. Assembled genes were then compared to the wild species to estimate origins. Results indicate that the cultivated sunflower pan-genome comprises 61,205 genes, of which 27% vary across genotypes. Approximately 10% of the cultivated sunflower pan-genome is derived through introgression from wild sunflower species, and 1.5% of genes originated solely through introgression. Gene ontology functional analyses further indicate that genes associated with biotic resistance are over-represented among introgressed regions, an observation consistent with breeding records. Analyses of allelic variation associated with downy mildew resistance provide an example in which such introgressions have contributed to resistance to a globally challenging disease.},
}

RevDate: 2018-12-30

Tao Y, Zhao X, Mace E, et al (2018)

Exploring and exploiting pan-genomics for crop improvement.

Molecular plant pii:S1674-2052(18)30383-6 [Epub ahead of print].

Genetic variation ranging from single nucleotide polymorphisms (SNPs) to large structural variants (SVs) can cause variation of gene content among individuals within the same species. There is an increasing appreciation that a single reference genome is insufficient to capture the full landscape of genetic diversity of a species. Pan-genome analysis offers a platform to evaluate genetic diversity of a species via investigation of its entire genome repertoire. Although a recent wave of pan-genomic studies has shed new light on crop diversity and improvement using advanced sequencing technology, the potential applications of crop pan-genomics in crop improvement are yet to be fully exploited. In this review, we highlight the progress achieved in understanding crop pan-genomics, discuss biological activities that cause SVs, review important agronomical traits affected by SVs and present our perspective on the application of pan-genomics in crop improvement.

@article {pmid30594655,
year = {2018},
author = {Tao, Y and Zhao, X and Mace, E and Henry, R and Jordan, D},
title = {Exploring and exploiting pan-genomics for crop improvement.},
journal = {Molecular plant},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.molp.2018.12.016},
pmid = {30594655},
issn = {1752-9867},
abstract = {Genetic variation ranging from single nucleotide polymorphisms (SNPs) to large structural variants (SVs) can cause variation of gene content among individuals within the same species. There is an increasing appreciation that a single reference genome is insufficient to capture the full landscape of genetic diversity of a species. Pan-genome analysis offers a platform to evaluate genetic diversity of a species via investigation of its entire genome repertoire. Although a recent wave of pan-genomic studies has shed new light on crop diversity and improvement using advanced sequencing technology, the potential applications of crop pan-genomics in crop improvement are yet to be fully exploited. In this review, we highlight the progress achieved in understanding crop pan-genomics, discuss biological activities that cause SVs, review important agronomical traits affected by SVs and present our perspective on the application of pan-genomics in crop improvement.},
}

There is a gradual shift from representing a species' genome by a single reference genome sequence to a pan-genome representation. Pan-genomes are the abstract representations of the genomes of all the strains that are present in the population or species. In this study, we employed a pan-genomic approach to analyze the intraspecific mitochondrial genome diversity of Fusarium graminearum. We present an improved reference mitochondrial genome for F. graminearum with an intron-exon annotation that was verified using RNA-seq data. Each of the 24 studied isolates had a distinct mitochondrial sequence. Length variation in the F. graminearum mitogenome was found to be largely due to variation of intron regions (99.98%). The "intronless" mitogenome length was found to be quite stable and could be informative when comparing species. The coding regions showed high conservation, while the variability of intergenic regions was highest. However, the most important variable parts are the intron regions, because they contain approximately half of the variable sites, make up more than half of the mitogenome, and show presence/absence variation. Furthermore, our analyses show that the mitogenome of F. graminearum is recombining, as was previously shown in F. oxysporum, indicating that mitogenome recombination is a common phenomenon in Fusarium. The majority of mitochondrial introns in F. graminearum belongs to group I introns, which are associated with homing endonuclease genes (HEGs). Mitochondrial introns containing HE genes may spread within populations through homing, where the endonuclease recognizes and cleaves the recognition site in the target gene. After cleavage of the "host" gene, it is replaced by the gene copy containing the intron with HEG. We propose to use introns unique to a population for tracking the spread of the given population, because introns can spread through vertical inheritance, recombination as well as via horizontal transfer. We demonstrate how pooled sequencing of strains can be used for mining mitogenome data. The usage of pooled sequencing offers a scalable solution for population analysis and for species level comparisons studies. This study may serve as a basis for future mitochondrial genome variability studies and representations.

@article {pmid30588394,
year = {2018},
author = {Brankovics, B and Kulik, T and Sawicki, J and Bilska, K and Zhang, H and de Hoog, GS and van der Lee, TA and Waalwijk, C and van Diepeningen, AD},
title = {First steps towards mitochondrial pan-genomics: detailed analysis of Fusarium graminearum mitogenomes.},
journal = {PeerJ},
volume = {6},
number = {},
pages = {e5963},
doi = {10.7717/peerj.5963},
pmid = {30588394},
issn = {2167-8359},
abstract = {There is a gradual shift from representing a species' genome by a single reference genome sequence to a pan-genome representation. Pan-genomes are the abstract representations of the genomes of all the strains that are present in the population or species. In this study, we employed a pan-genomic approach to analyze the intraspecific mitochondrial genome diversity of Fusarium graminearum. We present an improved reference mitochondrial genome for F. graminearum with an intron-exon annotation that was verified using RNA-seq data. Each of the 24 studied isolates had a distinct mitochondrial sequence. Length variation in the F. graminearum mitogenome was found to be largely due to variation of intron regions (99.98%). The "intronless" mitogenome length was found to be quite stable and could be informative when comparing species. The coding regions showed high conservation, while the variability of intergenic regions was highest. However, the most important variable parts are the intron regions, because they contain approximately half of the variable sites, make up more than half of the mitogenome, and show presence/absence variation. Furthermore, our analyses show that the mitogenome of F. graminearum is recombining, as was previously shown in F. oxysporum, indicating that mitogenome recombination is a common phenomenon in Fusarium. The majority of mitochondrial introns in F. graminearum belongs to group I introns, which are associated with homing endonuclease genes (HEGs). Mitochondrial introns containing HE genes may spread within populations through homing, where the endonuclease recognizes and cleaves the recognition site in the target gene. After cleavage of the "host" gene, it is replaced by the gene copy containing the intron with HEG. We propose to use introns unique to a population for tracking the spread of the given population, because introns can spread through vertical inheritance, recombination as well as via horizontal transfer. We demonstrate how pooled sequencing of strains can be used for mining mitogenome data. The usage of pooled sequencing offers a scalable solution for population analysis and for species level comparisons studies. This study may serve as a basis for future mitochondrial genome variability studies and representations.},
}

BACKGROUND: The genus Burkholderia consists of species that occupy remarkably diverse ecological niches. Its best known members are important pathogens, B. mallei and B. pseudomallei, which cause glanders and melioidosis, respectively. Burkholderia genomes are unusual due to their multichromosomal organization, generally comprised of 2-3 chromosomes.

RESULTS: We performed integrated genomic analysis of 127 Burkholderia strains. The pan-genome is open with the saturation to be reached between 86,000 and 88,000 genes. The reconstructed rearrangements indicate a strong avoidance of intra-replichore inversions that is likely caused by selection against the transfer of large groups of genes between the leading and the lagging strands. Translocated genes also tend to retain their position in the leading or the lagging strand, and this selection is stronger for large syntenies. Integrated reconstruction of chromosome rearrangements in the context of strains phylogeny reveals parallel rearrangements that may indicate inversion-based phase variation and integration of new genomic islands. In particular, we detected parallel inversions in the second chromosomes of B. pseudomallei with breakpoints formed by genes encoding membrane components of multidrug resistance complex, that may be linked to a phase variation mechanism. Two genomic islands, spreading horizontally between chromosomes, were detected in the B. cepacia group.

CONCLUSIONS: This study demonstrates the power of integrated analysis of pan-genomes, chromosome rearrangements, and selection regimes. Non-random inversion patterns indicate selective pressure, inversions are particularly frequent in a recent pathogen B. mallei, and, together with periods of positive selection at other branches, may indicate adaptation to new niches. One such adaptation could be a possible phase variation mechanism in B. pseudomallei.

RESULTS: We performed integrated genomic analysis of 127 Burkholderia strains. The pan-genome is open with the saturation to be reached between 86,000 and 88,000 genes. The reconstructed rearrangements indicate a strong avoidance of intra-replichore inversions that is likely caused by selection against the transfer of large groups of genes between the leading and the lagging strands. Translocated genes also tend to retain their position in the leading or the lagging strand, and this selection is stronger for large syntenies. Integrated reconstruction of chromosome rearrangements in the context of strains phylogeny reveals parallel rearrangements that may indicate inversion-based phase variation and integration of new genomic islands. In particular, we detected parallel inversions in the second chromosomes of B. pseudomallei with breakpoints formed by genes encoding membrane components of multidrug resistance complex, that may be linked to a phase variation mechanism. Two genomic islands, spreading horizontally between chromosomes, were detected in the B. cepacia group.

CONCLUSIONS: This study demonstrates the power of integrated analysis of pan-genomes, chromosome rearrangements, and selection regimes. Non-random inversion patterns indicate selective pressure, inversions are particularly frequent in a recent pathogen B. mallei, and, together with periods of positive selection at other branches, may indicate adaptation to new niches. One such adaptation could be a possible phase variation mechanism in B. pseudomallei.},
}

RevDate: 2018-12-27

Tyakht AV, Manolov AI, Kanygina AV, et al (2018)

Genetic diversity of Escherichia coli in gut microbiota of patients with Crohn's disease discovered using metagenomic and genomic analyses.

BMC genomics, 19(1):968 pii:10.1186/s12864-018-5306-5.

BACKGROUND: Crohn's disease is associated with gut dysbiosis. Independent studies have shown an increase in the abundance of certain bacterial species, particularly Escherichia coli with the adherent-invasive pathotype, in the gut. The role of these species in this disease needs to be elucidated.

METHODS: We performed a metagenomic study investigating the gut microbiota of patients with Crohn's disease. A metagenomic reconstruction of the consensus genome content of the species was used to assess the genetic variability.

RESULTS: The abnormal shifts in the microbial community structures in Crohn's disease were heterogeneous among the patients. The metagenomic data suggested the existence of multiple E. coli strains within individual patients. We discovered that the genetic diversity of the species was high and that only a few samples manifested similarity to the adherent-invasive varieties. The other species demonstrated genetic diversity comparable to that observed in the healthy subjects. Our results were supported by a comparison of the sequenced genomes of isolates from the same microbiota samples and a meta-analysis of published gut metagenomes.

CONCLUSIONS: The genomic diversity of Crohn's disease-associated E. coli within and among the patients paves the way towards an understanding of the microbial mechanisms underlying the onset and progression of the Crohn's disease and the development of new strategies for the prevention and treatment of this disease.

@article {pmid30587114,
year = {2018},
author = {Tyakht, AV and Manolov, AI and Kanygina, AV and Ischenko, DS and Kovarsky, BA and Popenko, AS and Pavlenko, AV and Elizarova, AV and Rakitina, DV and Baikova, JP and Ladygina, VG and Kostryukova, ES and Karpova, IY and Semashko, TA and Larin, AK and Grigoryeva, TV and Sinyagina, MN and Malanin, SY and Shcherbakov, PL and Kharitonova, AY and Khalif, IL and Shapina, MV and Maev, IV and Andreev, DN and Belousova, EA and Buzunova, YM and Alexeev, DG and Govorun, VM},
title = {Genetic diversity of Escherichia coli in gut microbiota of patients with Crohn's disease discovered using metagenomic and genomic analyses.},
journal = {BMC genomics},
volume = {19},
number = {1},
pages = {968},
doi = {10.1186/s12864-018-5306-5},
pmid = {30587114},
issn = {1471-2164},
support = {16-15-00258//Russian Science Foundation/ ; },
abstract = {BACKGROUND: Crohn's disease is associated with gut dysbiosis. Independent studies have shown an increase in the abundance of certain bacterial species, particularly Escherichia coli with the adherent-invasive pathotype, in the gut. The role of these species in this disease needs to be elucidated.

METHODS: We performed a metagenomic study investigating the gut microbiota of patients with Crohn's disease. A metagenomic reconstruction of the consensus genome content of the species was used to assess the genetic variability.

RESULTS: The abnormal shifts in the microbial community structures in Crohn's disease were heterogeneous among the patients. The metagenomic data suggested the existence of multiple E. coli strains within individual patients. We discovered that the genetic diversity of the species was high and that only a few samples manifested similarity to the adherent-invasive varieties. The other species demonstrated genetic diversity comparable to that observed in the healthy subjects. Our results were supported by a comparison of the sequenced genomes of isolates from the same microbiota samples and a meta-analysis of published gut metagenomes.

CONCLUSIONS: The genomic diversity of Crohn's disease-associated E. coli within and among the patients paves the way towards an understanding of the microbial mechanisms underlying the onset and progression of the Crohn's disease and the development of new strategies for the prevention and treatment of this disease.},
}

RevDate: 2018-12-26

Cheleuitte-Nieves C, Gulvik CA, McQuiston JR, et al (2018)

Genotypic differences between strains of the opportunistic pathogen Corynebacterium bovis isolated from humans, cows, and rodents.

PloS one, 13(12):e0209231 pii:PONE-D-18-24681.

Corynebacterium bovis is an opportunistic bacterial pathogen shown to cause eye and prosthetic joint infections as well as abscesses in humans, mastitis in dairy cattle, and skin disease in laboratory mice and rats. Little is known about the genetic characteristics and genomic diversity of C. bovis because only a single draft genome is available for the species. The overall aim of this study was to sequence and compare the genome of C. bovis isolates obtained from different species, locations, and time points. Whole-genome sequencing was conducted on 20 C. bovis isolates (six human, four bovine, nine mouse and one rat) using the Illumina MiSeq platform and submitted to various comparative analysis tools. Sequencing generated high-quality contigs (over 2.53 Mbp) that were comparable to the only reported assembly using C. bovis DSM 20582T (97.8 ± 0.36% completeness). The number of protein-coding DNA sequences (2,174 ± 12.4) was similar among all isolates. A Corynebacterium genus neighbor-joining tree was created, which revealed Corynebacterium falsenii as the nearest neighbor to C. bovis (95.87% similarity), although the reciprocal comparison shows Corynebacterium jeikeium as closest neighbor to C. falsenii. Interestingly, the average nucleotide identity demonstrated that the C. bovis isolates clustered by host, with human and bovine isolates clustering together, and the mouse and rat isolates forming a separate group. The average number of genomic islands and putative virulence factors were significantly higher (p<0.001) in the mouse and rat isolates as compared to human/bovine isolates. Corynebacterium bovis' pan-genome contained a total of 3,067 genes of which 1,354 represented core genes. The known core genes of all isolates were primarily related to ''metabolism" and ''information storage/processing." However, most genes were classified as ''function unknown" or "unclassified". Surprisingly, no intact prophages were found in any isolate; however, almost all isolates had at least one complete CRISPR-Cas system.

@article {pmid30586440,
year = {2018},
author = {Cheleuitte-Nieves, C and Gulvik, CA and McQuiston, JR and Humrighouse, BW and Bell, ME and Villarma, A and Fischetti, VA and Westblade, LF and Lipman, NS},
title = {Genotypic differences between strains of the opportunistic pathogen Corynebacterium bovis isolated from humans, cows, and rodents.},
journal = {PloS one},
volume = {13},
number = {12},
pages = {e0209231},
doi = {10.1371/journal.pone.0209231},
pmid = {30586440},
issn = {1932-6203},
abstract = {Corynebacterium bovis is an opportunistic bacterial pathogen shown to cause eye and prosthetic joint infections as well as abscesses in humans, mastitis in dairy cattle, and skin disease in laboratory mice and rats. Little is known about the genetic characteristics and genomic diversity of C. bovis because only a single draft genome is available for the species. The overall aim of this study was to sequence and compare the genome of C. bovis isolates obtained from different species, locations, and time points. Whole-genome sequencing was conducted on 20 C. bovis isolates (six human, four bovine, nine mouse and one rat) using the Illumina MiSeq platform and submitted to various comparative analysis tools. Sequencing generated high-quality contigs (over 2.53 Mbp) that were comparable to the only reported assembly using C. bovis DSM 20582T (97.8 ± 0.36% completeness). The number of protein-coding DNA sequences (2,174 ± 12.4) was similar among all isolates. A Corynebacterium genus neighbor-joining tree was created, which revealed Corynebacterium falsenii as the nearest neighbor to C. bovis (95.87% similarity), although the reciprocal comparison shows Corynebacterium jeikeium as closest neighbor to C. falsenii. Interestingly, the average nucleotide identity demonstrated that the C. bovis isolates clustered by host, with human and bovine isolates clustering together, and the mouse and rat isolates forming a separate group. The average number of genomic islands and putative virulence factors were significantly higher (p<0.001) in the mouse and rat isolates as compared to human/bovine isolates. Corynebacterium bovis' pan-genome contained a total of 3,067 genes of which 1,354 represented core genes. The known core genes of all isolates were primarily related to ''metabolism" and ''information storage/processing." However, most genes were classified as ''function unknown" or "unclassified". Surprisingly, no intact prophages were found in any isolate; however, almost all isolates had at least one complete CRISPR-Cas system.},
}

RevDate: 2018-12-21

Velsko IM, Chakraborty B, Nascimento MM, et al (2018)

Species Designations Belie Phenotypic and Genotypic Heterogeneity in Oral Streptococci.

mSystems, 3(6): pii:mSystems00158-18.

Health-associated oral Streptococcus species are promising probiotic candidates to protect against dental caries. Ammonia production through the arginine deiminase system (ADS), which can increase the pH of oral biofilms, and direct antagonism of caries-associated bacterial species are desirable properties for oral probiotic strains. ADS and antagonistic activities can vary dramatically among individuals, but the genetic basis for these differences is unknown. We sequenced whole genomes of a diverse set of clinical oral Streptococcus isolates and examined the genetic basis of variability in ADS and antagonistic activities. A total of 113 isolates were included and represented 10 species: Streptococcus australis, A12-like, S. cristatus, S. gordonii, S. intermedius, S. mitis, S. oralis including S. oralis subsp. dentisani, S. parasanguinis, S. salivarius, and S. sanguinis. Mean ADS activity and antagonism on Streptococcus mutans UA159 were measured for each isolate, and each isolate was whole genome shotgun sequenced on an Illumina MiSeq. Phylogenies were built of genes known to be involved in ADS activity and antagonism. Several approaches to correlate the pan-genome with phenotypes were performed. Phylogenies of genes previously identified in ADS activity and antagonism grouped isolates by species, but not by phenotype. A genome-wide association study (GWAS) identified additional genes potentially involved in ADS activity or antagonism across all the isolates we sequenced as well as within several species. Phenotypic heterogeneity in oral streptococci is not necessarily reflected by genotype and is not species specific. Probiotic strains must be carefully selected based on characterization of each strain and not based on inclusion within a certain species. IMPORTANCE Representative type strains are commonly used to characterize bacterial species, yet species are phenotypically and genotypically heterogeneous. Conclusions about strain physiology and activity based on a single strain therefore may be inappropriate and misleading. When selecting strains for probiotic use, the assumption that all strains within a species share the same desired probiotic characteristics may result in selection of a strain that lacks the desired traits, and therefore makes a minimally effective or ineffective probiotic. Health-associated oral streptococci are promising candidates for anticaries probiotics, but strains need to be carefully selected based on observed phenotypes. We characterized the genotypes and anticaries phenotypes of strains from 10 species of oral streptococci and demonstrate poor correlation between genotype and phenotype across all species.

@article {pmid30574560,
year = {2018},
author = {Velsko, IM and Chakraborty, B and Nascimento, MM and Burne, RA and Richards, VP},
title = {Species Designations Belie Phenotypic and Genotypic Heterogeneity in Oral Streptococci.},
journal = {mSystems},
volume = {3},
number = {6},
pages = {},
doi = {10.1128/mSystems.00158-18},
pmid = {30574560},
issn = {2379-5077},
abstract = {Health-associated oral Streptococcus species are promising probiotic candidates to protect against dental caries. Ammonia production through the arginine deiminase system (ADS), which can increase the pH of oral biofilms, and direct antagonism of caries-associated bacterial species are desirable properties for oral probiotic strains. ADS and antagonistic activities can vary dramatically among individuals, but the genetic basis for these differences is unknown. We sequenced whole genomes of a diverse set of clinical oral Streptococcus isolates and examined the genetic basis of variability in ADS and antagonistic activities. A total of 113 isolates were included and represented 10 species: Streptococcus australis, A12-like, S. cristatus, S. gordonii, S. intermedius, S. mitis, S. oralis including S. oralis subsp. dentisani, S. parasanguinis, S. salivarius, and S. sanguinis. Mean ADS activity and antagonism on Streptococcus mutans UA159 were measured for each isolate, and each isolate was whole genome shotgun sequenced on an Illumina MiSeq. Phylogenies were built of genes known to be involved in ADS activity and antagonism. Several approaches to correlate the pan-genome with phenotypes were performed. Phylogenies of genes previously identified in ADS activity and antagonism grouped isolates by species, but not by phenotype. A genome-wide association study (GWAS) identified additional genes potentially involved in ADS activity or antagonism across all the isolates we sequenced as well as within several species. Phenotypic heterogeneity in oral streptococci is not necessarily reflected by genotype and is not species specific. Probiotic strains must be carefully selected based on characterization of each strain and not based on inclusion within a certain species. IMPORTANCE Representative type strains are commonly used to characterize bacterial species, yet species are phenotypically and genotypically heterogeneous. Conclusions about strain physiology and activity based on a single strain therefore may be inappropriate and misleading. When selecting strains for probiotic use, the assumption that all strains within a species share the same desired probiotic characteristics may result in selection of a strain that lacks the desired traits, and therefore makes a minimally effective or ineffective probiotic. Health-associated oral streptococci are promising candidates for anticaries probiotics, but strains need to be carefully selected based on observed phenotypes. We characterized the genotypes and anticaries phenotypes of strains from 10 species of oral streptococci and demonstrate poor correlation between genotype and phenotype across all species.},
}

RevDate: 2018-12-19

Potter RF, Lainhart W, Twentyman J, et al (2018)

Population Structure, Antibiotic Resistance, and Uropathogenicity of Klebsiella variicola.

mBio, 9(6): pii:mBio.02481-18.

Klebsiella variicola is a member of the Klebsiella genus and often misidentified as Klebsiella pneumoniae or Klebsiella quasipneumoniae The importance of K. pneumoniae human infections has been known; however, a dearth of relative knowledge exists for K. variicola Despite its growing clinical importance, comprehensive analyses of K. variicola population structure and mechanistic investigations of virulence factors and antibiotic resistance genes have not yet been performed. To address this, we utilized in silico, in vitro, and in vivo methods to study a cohort of K. variicola isolates and genomes. We found that the K. variicola population structure has two distant lineages composed of two and 143 genomes, respectively. Ten of 145 K. variicola genomes harbored carbapenem resistance genes, and 6/145 contained complete virulence operons. While the β-lactam blaLEN and quinolone oqxAB antibiotic resistance genes were generally conserved within our institutional cohort, unexpectedly 11 isolates were nonresistant to the β-lactam ampicillin and only one isolate was nonsusceptible to the quinolone ciprofloxacin. K. variicola isolates have variation in ability to cause urinary tract infections in a newly developed murine model, but importantly a strain had statistically significant higher bladder CFU than the model uropathogenic K. pneumoniae strain TOP52. Type 1 pilus and genomic identification of altered fim operon structure were associated with differences in bladder CFU for the tested strains. Nine newly reported types of pilus genes were discovered in the K. variicola pan-genome, including the first identified P-pilus in Klebsiella spp.IMPORTANCE Infections caused by antibiotic-resistant bacterial pathogens are a growing public health threat. Understanding of pathogen relatedness and biology is imperative for tracking outbreaks and developing therapeutics. Here, we detail the phylogenetic structure of 145 K. variicola genomes from different continents. Our results have important clinical ramifications as high-risk antibiotic resistance genes are present in K. variicola genomes from a variety of geographic locations and as we demonstrate that K. variicola clinical isolates can establish higher bladder titers than K. pneumoniae Differential presence of these pilus genes inK. variicola isolates may indicate adaption for specific environmental niches. Therefore, due to the potential of multidrug resistance and pathogenic efficacy, identification of K. variicola and K. pneumoniae to a species level should be performed to optimally improve patient outcomes during infection. This work provides a foundation for our improved understanding of K. variicola biology and pathogenesis.

@article {pmid30563902,
year = {2018},
author = {Potter, RF and Lainhart, W and Twentyman, J and Wallace, MA and Wang, B and Burnham, CA and Rosen, DA and Dantas, G},
title = {Population Structure, Antibiotic Resistance, and Uropathogenicity of Klebsiella variicola.},
journal = {mBio},
volume = {9},
number = {6},
pages = {},
doi = {10.1128/mBio.02481-18},
pmid = {30563902},
issn = {2150-7511},
abstract = {Klebsiella variicola is a member of the Klebsiella genus and often misidentified as Klebsiella pneumoniae or Klebsiella quasipneumoniae The importance of K. pneumoniae human infections has been known; however, a dearth of relative knowledge exists for K. variicola Despite its growing clinical importance, comprehensive analyses of K. variicola population structure and mechanistic investigations of virulence factors and antibiotic resistance genes have not yet been performed. To address this, we utilized in silico, in vitro, and in vivo methods to study a cohort of K. variicola isolates and genomes. We found that the K. variicola population structure has two distant lineages composed of two and 143 genomes, respectively. Ten of 145 K. variicola genomes harbored carbapenem resistance genes, and 6/145 contained complete virulence operons. While the β-lactam blaLEN and quinolone oqxAB antibiotic resistance genes were generally conserved within our institutional cohort, unexpectedly 11 isolates were nonresistant to the β-lactam ampicillin and only one isolate was nonsusceptible to the quinolone ciprofloxacin. K. variicola isolates have variation in ability to cause urinary tract infections in a newly developed murine model, but importantly a strain had statistically significant higher bladder CFU than the model uropathogenic K. pneumoniae strain TOP52. Type 1 pilus and genomic identification of altered fim operon structure were associated with differences in bladder CFU for the tested strains. Nine newly reported types of pilus genes were discovered in the K. variicola pan-genome, including the first identified P-pilus in Klebsiella spp.IMPORTANCE Infections caused by antibiotic-resistant bacterial pathogens are a growing public health threat. Understanding of pathogen relatedness and biology is imperative for tracking outbreaks and developing therapeutics. Here, we detail the phylogenetic structure of 145 K. variicola genomes from different continents. Our results have important clinical ramifications as high-risk antibiotic resistance genes are present in K. variicola genomes from a variety of geographic locations and as we demonstrate that K. variicola clinical isolates can establish higher bladder titers than K. pneumoniae Differential presence of these pilus genes inK. variicola isolates may indicate adaption for specific environmental niches. Therefore, due to the potential of multidrug resistance and pathogenic efficacy, identification of K. variicola and K. pneumoniae to a species level should be performed to optimally improve patient outcomes during infection. This work provides a foundation for our improved understanding of K. variicola biology and pathogenesis.},
}

RevDate: 2018-12-19

Pang TY, MJ Lercher (2018)

Each of 3,323 metabolic innovations in the evolution of E. coli arose through the horizontal transfer of a single DNA segment.

Proceedings of the National Academy of Sciences of the United States of America pii:1718997115 [Epub ahead of print].

Even closely related prokaryotes often show an astounding diversity in their ability to grow in different nutritional environments. It has been hypothesized that complex metabolic adaptations-those requiring the independent acquisition of multiple new genes-can evolve via selectively neutral intermediates. However, it is unclear whether this neutral exploration of phenotype space occurs in nature, or what fraction of metabolic adaptations is indeed complex. Here, we reconstruct metabolic models for the ancestors of a phylogeny of 53 Escherichia coli strains, linking genotypes to phenotypes on a genome-wide, macroevolutionary scale. Based on the ancestral and extant metabolic models, we identify 3,323 phenotypic innovations in the history of the E. coli clade that arose through changes in accessory genome content. Of these innovations, 1,998 allow growth in previously inaccessible environments, while 1,325 increase biomass yield. Strikingly, every observed innovation arose through the horizontal acquisition of a single DNA segment less than 30 kb long. Although we found no evidence for the contribution of selectively neutral processes, 10.6% of metabolic innovations were facilitated by horizontal gene transfers on earlier phylogenetic branches, consistent with a stepwise adaptation to successive environments. Ninety-eight percent of metabolic phenotypes accessible to the combined E. coli pangenome can be bestowed on any individual strain by transferring a single DNA segment from one of the extant strains. These results demonstrate an amazing ability of the E. coli lineage to adapt to novel environments through single horizontal gene transfers (followed by regulatory adaptations), an ability likely mirrored in other clades of generalist bacteria.

@article {pmid30563853,
year = {2018},
author = {Pang, TY and Lercher, MJ},
title = {Each of 3,323 metabolic innovations in the evolution of E. coli arose through the horizontal transfer of a single DNA segment.},
journal = {Proceedings of the National Academy of Sciences of the United States of America},
volume = {},
number = {},
pages = {},
doi = {10.1073/pnas.1718997115},
pmid = {30563853},
issn = {1091-6490},
abstract = {Even closely related prokaryotes often show an astounding diversity in their ability to grow in different nutritional environments. It has been hypothesized that complex metabolic adaptations-those requiring the independent acquisition of multiple new genes-can evolve via selectively neutral intermediates. However, it is unclear whether this neutral exploration of phenotype space occurs in nature, or what fraction of metabolic adaptations is indeed complex. Here, we reconstruct metabolic models for the ancestors of a phylogeny of 53 Escherichia coli strains, linking genotypes to phenotypes on a genome-wide, macroevolutionary scale. Based on the ancestral and extant metabolic models, we identify 3,323 phenotypic innovations in the history of the E. coli clade that arose through changes in accessory genome content. Of these innovations, 1,998 allow growth in previously inaccessible environments, while 1,325 increase biomass yield. Strikingly, every observed innovation arose through the horizontal acquisition of a single DNA segment less than 30 kb long. Although we found no evidence for the contribution of selectively neutral processes, 10.6% of metabolic innovations were facilitated by horizontal gene transfers on earlier phylogenetic branches, consistent with a stepwise adaptation to successive environments. Ninety-eight percent of metabolic phenotypes accessible to the combined E. coli pangenome can be bestowed on any individual strain by transferring a single DNA segment from one of the extant strains. These results demonstrate an amazing ability of the E. coli lineage to adapt to novel environments through single horizontal gene transfers (followed by regulatory adaptations), an ability likely mirrored in other clades of generalist bacteria.},
}

Pasteurella multocida causes respiratory infectious diseases in a multitude of birds and mammals. A number of virulence-associated genes were reported across different strains of P. multocida, including those involved in the iron transport and metabolism. Comparative iron-associated genes of P. multocida among different animal hosts towards their interaction networks have not been fully revealed. Therefore, this study aimed to identify the iron-associated genes from core- and pan-genomes of fourteen P. multocida strains and to construct iron-associated protein interaction networks using genome-scale network analysis which might be associated with the virulence. Results showed that these fourteen strains had 1587 genes in the core-genome and 3400 genes constituting their pan-genome. Out of these, 2651 genes associated with iron transport and metabolism were selected to construct the protein interaction networks and 361 genes were incorporated into the iron-associated protein interaction network (iPIN) consisting of nine different iron-associated functional modules. After comparing with the virulence factor database (VFDB), 21 virulence-associated proteins were determined and 11 of these belonged to the heme biosynthesis module. From this study, the core heme biosynthesis module and the core outer membrane hemoglobin receptor HgbA were proposed as candidate targets to design novel antibiotics and vaccines for preventing pasteurellosis across the serotypes or animal hosts for enhanced precision agriculture to ensure sustainability in food security.

@article {pmid30550841,
year = {2018},
author = {Jatuponwiphat, T and Chumnanpuen, P and Othman, S and E-Kobon, T and Vongsangnak, W},
title = {Iron-associated protein interaction networks reveal the key functional modules related to survival and virulence of Pasteurella multocida.},
journal = {Microbial pathogenesis},
volume = {},
number = {},
pages = {},
doi = {10.1016/j.micpath.2018.12.013},
pmid = {30550841},
issn = {1096-1208},
abstract = {Pasteurella multocida causes respiratory infectious diseases in a multitude of birds and mammals. A number of virulence-associated genes were reported across different strains of P. multocida, including those involved in the iron transport and metabolism. Comparative iron-associated genes of P. multocida among different animal hosts towards their interaction networks have not been fully revealed. Therefore, this study aimed to identify the iron-associated genes from core- and pan-genomes of fourteen P. multocida strains and to construct iron-associated protein interaction networks using genome-scale network analysis which might be associated with the virulence. Results showed that these fourteen strains had 1587 genes in the core-genome and 3400 genes constituting their pan-genome. Out of these, 2651 genes associated with iron transport and metabolism were selected to construct the protein interaction networks and 361 genes were incorporated into the iron-associated protein interaction network (iPIN) consisting of nine different iron-associated functional modules. After comparing with the virulence factor database (VFDB), 21 virulence-associated proteins were determined and 11 of these belonged to the heme biosynthesis module. From this study, the core heme biosynthesis module and the core outer membrane hemoglobin receptor HgbA were proposed as candidate targets to design novel antibiotics and vaccines for preventing pasteurellosis across the serotypes or animal hosts for enhanced precision agriculture to ensure sustainability in food security.},
}

The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81-0.97). While the best models most frequently employed gene content, an average accuracy score of 0.90 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.

@article {pmid30550564,
year = {2018},
author = {Moradigaravand, D and Palm, M and Farewell, A and Mustonen, V and Warringer, J and Parts, L},
title = {Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data.},
journal = {PLoS computational biology},
volume = {14},
number = {12},
pages = {e1006258},
doi = {10.1371/journal.pcbi.1006258},
pmid = {30550564},
issn = {1553-7358},
abstract = {The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81-0.97). While the best models most frequently employed gene content, an average accuracy score of 0.90 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.},
}

RevDate: 2018-12-12

Colson P, Levasseur A, La Scola B, et al (2018)

Ancestrality and Mosaicism of Giant Viruses Supporting the Definition of the Fourth TRUC of Microbes.

Frontiers in microbiology, 9:2668.

Giant viruses of amoebae were discovered in 2003. Since then, their diversity has greatly expanded. They were suggested to form a fourth branch of life, collectively named 'TRUC' (for "Things Resisting Uncompleted Classifications") alongside Bacteria, Archaea, and Eukarya. Their origin and ancestrality remain controversial. Here, we specify the evolution and definition of giant viruses. Phylogenetic and phenetic analyses of informational gene repertoires of giant viruses and selected bacteria, archaea and eukaryota were performed, including structural phylogenomics based on protein structural domains grouped into 289 universal fold superfamilies (FSFs). Hierarchical clustering analysis was performed based on a binary presence/absence matrix constructed using 727 informational COGs from cellular organisms. The presence/absence of 'universal' FSF domains was used to generate an unrooted maximum parsimony phylogenomic tree. Comparison of the gene content of a giant virus with those of a bacterium, an archaeon, and a eukaryote with small genomes was also performed. Overall, both cladistic analyses based on gene sequences of very central and ancient proteins and on highly conserved protein fold structures as well as phenetic analyses were congruent regarding the delineation of a fourth branch of microbes comprised by giant viruses. Giant viruses appeared as a basal group in the tree of all proteomes. A pangenome and core genome determined for Rickettsia bellii (bacteria), Methanomassiliicoccus luminyensis (archaeon), Encephalitozoon intestinalis (eukaryote), and Tupanvirus (giant virus) showed a substantial proportion of Tupanvirus genes that overlap with those of the cellular microbes. In addition, a substantial genome mosaicism was observed, with 51, 11, 8, and 0.2% of Tupanvirus genes best matching with viruses, eukaryota, bacteria, and archaea, respectively. Finally, we found that genes themselves may be subject to lateral sequence transfers. In summary, our data highlight the quantum leap between classical and giant viruses. Phylogenetic and phyletic analyses and the study of protein fold superfamilies confirm previous evidence of the existence of a fourth TRUC of life that includes giant viruses, and highlight its ancestrality and mosaicism. They also point out that best evolutionary representations for giant viruses and cellular microorganisms are rhizomes, and that sequence transfers rather than gene transfers have to be considered.

@article {pmid30538677,
year = {2018},
author = {Colson, P and Levasseur, A and La Scola, B and Sharma, V and Nasir, A and Pontarotti, P and Caetano-Anollés, G and Raoult, D},
title = {Ancestrality and Mosaicism of Giant Viruses Supporting the Definition of the Fourth TRUC of Microbes.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {2668},
doi = {10.3389/fmicb.2018.02668},
pmid = {30538677},
issn = {1664-302X},
abstract = {Giant viruses of amoebae were discovered in 2003. Since then, their diversity has greatly expanded. They were suggested to form a fourth branch of life, collectively named 'TRUC' (for "Things Resisting Uncompleted Classifications") alongside Bacteria, Archaea, and Eukarya. Their origin and ancestrality remain controversial. Here, we specify the evolution and definition of giant viruses. Phylogenetic and phenetic analyses of informational gene repertoires of giant viruses and selected bacteria, archaea and eukaryota were performed, including structural phylogenomics based on protein structural domains grouped into 289 universal fold superfamilies (FSFs). Hierarchical clustering analysis was performed based on a binary presence/absence matrix constructed using 727 informational COGs from cellular organisms. The presence/absence of 'universal' FSF domains was used to generate an unrooted maximum parsimony phylogenomic tree. Comparison of the gene content of a giant virus with those of a bacterium, an archaeon, and a eukaryote with small genomes was also performed. Overall, both cladistic analyses based on gene sequences of very central and ancient proteins and on highly conserved protein fold structures as well as phenetic analyses were congruent regarding the delineation of a fourth branch of microbes comprised by giant viruses. Giant viruses appeared as a basal group in the tree of all proteomes. A pangenome and core genome determined for Rickettsia bellii (bacteria), Methanomassiliicoccus luminyensis (archaeon), Encephalitozoon intestinalis (eukaryote), and Tupanvirus (giant virus) showed a substantial proportion of Tupanvirus genes that overlap with those of the cellular microbes. In addition, a substantial genome mosaicism was observed, with 51, 11, 8, and 0.2% of Tupanvirus genes best matching with viruses, eukaryota, bacteria, and archaea, respectively. Finally, we found that genes themselves may be subject to lateral sequence transfers. In summary, our data highlight the quantum leap between classical and giant viruses. Phylogenetic and phyletic analyses and the study of protein fold superfamilies confirm previous evidence of the existence of a fourth TRUC of life that includes giant viruses, and highlight its ancestrality and mosaicism. They also point out that best evolutionary representations for giant viruses and cellular microorganisms are rhizomes, and that sequence transfers rather than gene transfers have to be considered.},
}

Summary: Genome-wide association studies (GWAS) in microbes have different challenges to GWAS in eukaryotes. These have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.

pyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://pyseer.readthedocs.io.

Supplementary information: Supplementary data are available at Bioinformatics online.

@article {pmid30535304,
year = {2018},
author = {Lees, JA and Galardini, M and Bentley, SD and Weiser, JN and Corander, J},
title = {pyseer: a comprehensive tool for microbial pangenome-wide association studies.},
journal = {Bioinformatics (Oxford, England)},
volume = {34},
number = {24},
pages = {4310-4312},
doi = {10.1093/bioinformatics/bty539},
pmid = {30535304},
issn = {1367-4811},
abstract = {Summary: Genome-wide association studies (GWAS) in microbes have different challenges to GWAS in eukaryotes. These have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.

pyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://pyseer.readthedocs.io.

Supplementary information: Supplementary data are available at Bioinformatics online.},
}

RevDate: 2018-12-11

Fritsch L, Felten A, Palma F, et al (2018)

Insights from genome-wide approaches to identify variants associated to phenotypes at pan-genome scale: Application to L. monocytogenes' ability to grow in cold conditions.

Intraspecific variability of the behavior of most foodborne pathogens is well described and taken into account in Quantitative Microbial Risk Assessment (QMRA), but factors (strain origin, serotype, …) explaining these differences are scarce or contradictory between studies. Nowadays, Whole Genome Sequencing (WGS) offers new opportunities to explain intraspecific variability of food pathogens, based on various recently published bioinformatics tools. The objective of this study is to get a better insight into different existing bioinformatics approaches to associate bacterial phenotype(s) and genotype(s). Therefore, a dataset of 51 L. monocytogenes strains, isolated from multiple sources (i.e. different food matrices and environments) and belonging to 17 clonal complexes (CC), were selected to represent large population diversity. Furthermore, the phenotypic variability of growth at low temperature was determined (i.e. qualitative phenotype), and the whole genomes of selected strains were sequenced. The almost exhaustive gene content, as well as the core genome SNPs based phylogenetic reconstruction, were derived from the whole sequenced genomes. A Bayesian inference method was applied to identify the branches on which the phenotype distribution evolves within sub-lineages. Two different Genome Wide Association Studies (i.e. gene- and SNP-based GWAS) were independently performed in order to link genetic mutations to the phenotype of interest. The genomic analyses presented in this study were successfully applied on the selected dataset. The Bayesian phylogenetic approach emphasized an association with "slow" growth ability at 2 °C of the lineage I, as well as CC9 of the lineage II. Moreover, both gene- and SNP-GWAS approaches displayed significant statistical associations with the tested phenotype. A list of 114 significantly associated genes, including genes already known to be involved in the cold adaption mechanism of L. monocytogenes and genes associated to mobile genetic elements (MGE), resulted from the gene-GWAS. On the other hand, a group of 184 highly associated SNPs were highlighted by SNP-GWAS, including SNPs detected in genes which were already likely involved in cold adaption; hypothetical proteins; and intergenic regions where for example promotors and regulators can be located. The successful application of combined bioinformatics approaches associating WGS-genotypes and specific phenotypes, could contribute to improve prediction of microbial behaviors in food. The implementation of this information in hazard identification and exposure assessment processes will open new possibilities to feed QMRA-models.

@article {pmid30530095,
year = {2018},
author = {Fritsch, L and Felten, A and Palma, F and Mariet, JF and Radomski, N and Mistou, MY and Augustin, JC and Guillier, L},
title = {Insights from genome-wide approaches to identify variants associated to phenotypes at pan-genome scale: Application to L. monocytogenes' ability to grow in cold conditions.},
journal = {International journal of food microbiology},
volume = {291},
number = {},
pages = {181-188},
doi = {10.1016/j.ijfoodmicro.2018.11.028},
pmid = {30530095},
issn = {1879-3460},
abstract = {Intraspecific variability of the behavior of most foodborne pathogens is well described and taken into account in Quantitative Microbial Risk Assessment (QMRA), but factors (strain origin, serotype, …) explaining these differences are scarce or contradictory between studies. Nowadays, Whole Genome Sequencing (WGS) offers new opportunities to explain intraspecific variability of food pathogens, based on various recently published bioinformatics tools. The objective of this study is to get a better insight into different existing bioinformatics approaches to associate bacterial phenotype(s) and genotype(s). Therefore, a dataset of 51 L. monocytogenes strains, isolated from multiple sources (i.e. different food matrices and environments) and belonging to 17 clonal complexes (CC), were selected to represent large population diversity. Furthermore, the phenotypic variability of growth at low temperature was determined (i.e. qualitative phenotype), and the whole genomes of selected strains were sequenced. The almost exhaustive gene content, as well as the core genome SNPs based phylogenetic reconstruction, were derived from the whole sequenced genomes. A Bayesian inference method was applied to identify the branches on which the phenotype distribution evolves within sub-lineages. Two different Genome Wide Association Studies (i.e. gene- and SNP-based GWAS) were independently performed in order to link genetic mutations to the phenotype of interest. The genomic analyses presented in this study were successfully applied on the selected dataset. The Bayesian phylogenetic approach emphasized an association with "slow" growth ability at 2 °C of the lineage I, as well as CC9 of the lineage II. Moreover, both gene- and SNP-GWAS approaches displayed significant statistical associations with the tested phenotype. A list of 114 significantly associated genes, including genes already known to be involved in the cold adaption mechanism of L. monocytogenes and genes associated to mobile genetic elements (MGE), resulted from the gene-GWAS. On the other hand, a group of 184 highly associated SNPs were highlighted by SNP-GWAS, including SNPs detected in genes which were already likely involved in cold adaption; hypothetical proteins; and intergenic regions where for example promotors and regulators can be located. The successful application of combined bioinformatics approaches associating WGS-genotypes and specific phenotypes, could contribute to improve prediction of microbial behaviors in food. The implementation of this information in hazard identification and exposure assessment processes will open new possibilities to feed QMRA-models.},
}

RevDate: 2018-12-04

Timms VJ, Nguyen T, Crighton T, et al (2018)

Genome-wide comparison of Corynebacterium diphtheriae isolates from Australia identifies differences in the Pan-genomes between respiratory and cutaneous strains.

BMC genomics, 19(1):869 pii:10.1186/s12864-018-5147-2.

BACKGROUND: Corynebacterium diphtheriae is the main etiological agent of diphtheria, a global disease causing life-threatening infections, particularly in infants and children. Vaccination with diphtheria toxoid protects against infection with potent toxin producing strains. However a growing number of apparently non-toxigenic but potentially invasive C. diphtheriae strains are identified in countries with low prevalence of diphtheria, raising key questions about genomic structures and population dynamics of the species. This study examined genomic diversity among 48 C. diphtheriae isolates collected in Australia over a 12-year period using whole genome sequencing. Phylogeny was determined using SNP-based mapping and genome wide analysis.

RESULTS: C. diphtheriae sequence type (ST) 32, a non-toxigenic clone with evidence of enhanced virulence that has been also circulating in Europe, appears to be endemic in Australia. Isolates from temporospatially related patients displayed the same ST and similarity in their core genomes. The genome-wide analysis highlighted a role of pilins, adhesion factors and iron utilization in infections caused by non-toxigenic strains.

CONCLUSIONS: The genomic diversity of toxigenic and non-toxigenic strains of C. diphtheriae in Australia suggests multiple sources of infection and colonisation. Genomic surveillance of co-circulating toxigenic and non-toxigenic C. diphtheriae offer new insights into the evolution and virulence of pathogenic clones and can inform targeted public health actions and policy. The genomes presented in this investigation will contribute to the global surveillance of C. diphtheriae both for the monitoring of antibiotic resistance genes and virulent strains such as those belonging to ST32.

RESULTS: C. diphtheriae sequence type (ST) 32, a non-toxigenic clone with evidence of enhanced virulence that has been also circulating in Europe, appears to be endemic in Australia. Isolates from temporospatially related patients displayed the same ST and similarity in their core genomes. The genome-wide analysis highlighted a role of pilins, adhesion factors and iron utilization in infections caused by non-toxigenic strains.

CONCLUSIONS: The genomic diversity of toxigenic and non-toxigenic strains of C. diphtheriae in Australia suggests multiple sources of infection and colonisation. Genomic surveillance of co-circulating toxigenic and non-toxigenic C. diphtheriae offer new insights into the evolution and virulence of pathogenic clones and can inform targeted public health actions and policy. The genomes presented in this investigation will contribute to the global surveillance of C. diphtheriae both for the monitoring of antibiotic resistance genes and virulent strains such as those belonging to ST32.},
}

RevDate: 2018-11-30

Bonnici V, Giugno R, V Manca (2018)

PanDelos: a dictionary-based method for pan-genome content discovery.

BMC bioinformatics, 19(Suppl 15):437 pii:10.1186/s12859-018-2417-6.

BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations.

RESULTS: We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm.

CONCLUSIONS: PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos .

@article {pmid30497358,
year = {2018},
author = {Bonnici, V and Giugno, R and Manca, V},
title = {PanDelos: a dictionary-based method for pan-genome content discovery.},
journal = {BMC bioinformatics},
volume = {19},
number = {Suppl 15},
pages = {437},
doi = {10.1186/s12859-018-2417-6},
pmid = {30497358},
issn = {1471-2105},
abstract = {BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations.

RESULTS: We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm.

CONCLUSIONS: PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos .},
}

RevDate: 2018-11-29

Freschi L, Vincent AT, Jeukens J, et al (2018)

The Pseudomonas aeruginosa pan-genome provides new insights on its population structure, horizontal gene transfer and pathogenicity.

Genome biology and evolution pii:5215156 [Epub ahead of print].

The huge increase in the availability of bacterial genomes led us to a point in which we can investigate and query pan-genomes, i.e. the full set of genes of a given bacterial species or clade. Here, we used a dataset of 1,311 high-quality genomes from the human pathogen Pseudomonas aeruginosa, 619 of which were newly sequenced, to show that a pan-genomic approach can greatly refine the population structure of bacterial species, provide new insights to define species boundaries, and generate hypotheses on the evolution of pathogenicity. The 665-gene P. aeruginosa core genome presented here, which constitutes only 1% of the entire pan-genome, is the first to be in the same order of magnitude as the minimal bacterial genome and represents a conservative estimate of the actual core genome. Moreover, the phylogeny based on this core genome provides strong evidence for a five-group population structure that includes two previously undescribed groups of isolates. Comparative genomics focusing on antimicrobial resistance and virulence genes showed that variation among isolates was partly linked to this population structure. Finally, we hypothesized that horizontal gene transfer had an important role in this respect, and found a total of 3,010 putative complete and fragmented plasmids, 5 and 12% of which contained resistance or virulence genes, respectively. This work provides data and strategies to study the evolutionary trajectories of resistance and virulence in Pseudomonas aeruginosa.

@article {pmid30496396,
year = {2018},
author = {Freschi, L and Vincent, AT and Jeukens, J and Emond-Rheault, JG and Kukavica-Ibrulj, I and Dupont, MJ and Charette, SJ and Boyle, B and Levesque, RC},
title = {The Pseudomonas aeruginosa pan-genome provides new insights on its population structure, horizontal gene transfer and pathogenicity.},
journal = {Genome biology and evolution},
volume = {},
number = {},
pages = {},
doi = {10.1093/gbe/evy259},
pmid = {30496396},
issn = {1759-6653},
abstract = {The huge increase in the availability of bacterial genomes led us to a point in which we can investigate and query pan-genomes, i.e. the full set of genes of a given bacterial species or clade. Here, we used a dataset of 1,311 high-quality genomes from the human pathogen Pseudomonas aeruginosa, 619 of which were newly sequenced, to show that a pan-genomic approach can greatly refine the population structure of bacterial species, provide new insights to define species boundaries, and generate hypotheses on the evolution of pathogenicity. The 665-gene P. aeruginosa core genome presented here, which constitutes only 1% of the entire pan-genome, is the first to be in the same order of magnitude as the minimal bacterial genome and represents a conservative estimate of the actual core genome. Moreover, the phylogeny based on this core genome provides strong evidence for a five-group population structure that includes two previously undescribed groups of isolates. Comparative genomics focusing on antimicrobial resistance and virulence genes showed that variation among isolates was partly linked to this population structure. Finally, we hypothesized that horizontal gene transfer had an important role in this respect, and found a total of 3,010 putative complete and fragmented plasmids, 5 and 12% of which contained resistance or virulence genes, respectively. This work provides data and strategies to study the evolutionary trajectories of resistance and virulence in Pseudomonas aeruginosa.},
}

Capsular Switching and ICE Transformation Occurred in Human Streptococcus agalactiae ST19 With High Pathogenicity to Fish.

Frontiers in veterinary science, 5:281.

Although Streptococcus agalactiae (GBS) cross-infection between human and fish has been confirmed in experimental and clinical studies, the mechanisms underlying GBS cross-species infection remain largely unclear. We have found different human GBS ST19 strains exhibiting strong or weak pathogenic to fish (sGBS and wGBS). In this study, our objective was to identify the genetic elements responsible for GBS cross species infection based on genome sequence data and comparative genomics. The genomes of 11 sGBS strains and 11 wGBS strains were sequenced, and the genomic analysis was performed base on pan-genome, CRISPRs, phylogenetic reconstruction and genome comparison. The results from the pan-genome, CRISPRs analysis and phylogenetic reconstruction indicated that genomes between sGBS were more conservative than that of wGBS. The genomic differences between sGBS and wGBS were primarily in the Cps region (about 111 kb) and its adjacent ICE region (about 106 kb). The Cps region included the entire cps operon, and all sGBS were capsular polysaccharide (CPS) type V, while all wGBS were CPS type III. The ICE region of sGBS contained integrative and conjugative elements (ICE) with IQ element and erm(TR), and was very conserved, whereas the ICE region of wGBS contained ICE with mega elements and the variation was large. The capsular switching (III-V) and transformation of ICE adjacent to the Cps region occurred in human GBS ST19 with different pathogenicity to fish, which may be related to the capability of GBS cross-infection.

@article {pmid30483518,
year = {2018},
author = {Wang, R and Li, L and Huang, T and Huang, W and Lei, A and Chen, M},
title = {Capsular Switching and ICE Transformation Occurred in Human Streptococcus agalactiae ST19 With High Pathogenicity to Fish.},
journal = {Frontiers in veterinary science},
volume = {5},
number = {},
pages = {281},
doi = {10.3389/fvets.2018.00281},
pmid = {30483518},
issn = {2297-1769},
abstract = {Although Streptococcus agalactiae (GBS) cross-infection between human and fish has been confirmed in experimental and clinical studies, the mechanisms underlying GBS cross-species infection remain largely unclear. We have found different human GBS ST19 strains exhibiting strong or weak pathogenic to fish (sGBS and wGBS). In this study, our objective was to identify the genetic elements responsible for GBS cross species infection based on genome sequence data and comparative genomics. The genomes of 11 sGBS strains and 11 wGBS strains were sequenced, and the genomic analysis was performed base on pan-genome, CRISPRs, phylogenetic reconstruction and genome comparison. The results from the pan-genome, CRISPRs analysis and phylogenetic reconstruction indicated that genomes between sGBS were more conservative than that of wGBS. The genomic differences between sGBS and wGBS were primarily in the Cps region (about 111 kb) and its adjacent ICE region (about 106 kb). The Cps region included the entire cps operon, and all sGBS were capsular polysaccharide (CPS) type V, while all wGBS were CPS type III. The ICE region of sGBS contained integrative and conjugative elements (ICE) with IQ element and erm(TR), and was very conserved, whereas the ICE region of wGBS contained ICE with mega elements and the variation was large. The capsular switching (III-V) and transformation of ICE adjacent to the Cps region occurred in human GBS ST19 with different pathogenicity to fish, which may be related to the capability of GBS cross-infection.},
}

RESULTS: The DSM 29614 strain genome was sequenced and analysed by a combination of in silico procedures. Comparative genomics, performed between 85 K. oxytoca representatives and K. oxytoca DSM 29614, revealed that this bacterial group has an open pangenome, characterized by a very small core genome (1009 genes, about 2%), a high fraction of unique (43,808 genes, about 87%) and accessory genes (5559 genes, about 11%). Proteins belonging to COG categories "Carbohydrate transport and metabolism" (G), "Amino acid transport and metabolism" (E), "Coenzyme transport and metabolism" (H), "Inorganic ion transport and metabolism" (P), and "membrane biogenesis-related proteins" (M) are particularly abundant in the predicted proteome of DSM 29614 strain. The results of a protein functional enrichment analysis - based on a previous proteomic analysis - revealed metabolic optimization during Fe(III)-citrate anaerobic utilization. In this growth condition, the observed high levels of Fe(II) may be due to different flavin metal reductases and siderophores as inferred form genome analysis. The presence of genes responsible for the synthesis of exopolysaccharide and for the tolerance to heavy metals was highlighted too. The inferred genomic insights were confirmed by a set of phenotypic tests showing specific metabolic capability in terms of i) Fe2+ and exopolysaccharide production and ii) phosphatase activity involved in precipitation of metal ion-phosphate salts.

CONCLUSION: The K. oxytoca DSM 29614 unique capabilities of using Fe(III)-citrate as sole carbon and energy source in anaerobiosis and tolerating diverse metals coincides with the presence at the genomic level of specific genes that can support i) energy metabolism optimization, ii) cell protection by the biosynthesis of a peculiar exopolysaccharide armour entrapping metal ions and iii) general and metal-specific detoxifying activities by different proteins and metabolites.

RESULTS: The DSM 29614 strain genome was sequenced and analysed by a combination of in silico procedures. Comparative genomics, performed between 85 K. oxytoca representatives and K. oxytoca DSM 29614, revealed that this bacterial group has an open pangenome, characterized by a very small core genome (1009 genes, about 2%), a high fraction of unique (43,808 genes, about 87%) and accessory genes (5559 genes, about 11%). Proteins belonging to COG categories "Carbohydrate transport and metabolism" (G), "Amino acid transport and metabolism" (E), "Coenzyme transport and metabolism" (H), "Inorganic ion transport and metabolism" (P), and "membrane biogenesis-related proteins" (M) are particularly abundant in the predicted proteome of DSM 29614 strain. The results of a protein functional enrichment analysis - based on a previous proteomic analysis - revealed metabolic optimization during Fe(III)-citrate anaerobic utilization. In this growth condition, the observed high levels of Fe(II) may be due to different flavin metal reductases and siderophores as inferred form genome analysis. The presence of genes responsible for the synthesis of exopolysaccharide and for the tolerance to heavy metals was highlighted too. The inferred genomic insights were confirmed by a set of phenotypic tests showing specific metabolic capability in terms of i) Fe2+ and exopolysaccharide production and ii) phosphatase activity involved in precipitation of metal ion-phosphate salts.

CONCLUSION: The K. oxytoca DSM 29614 unique capabilities of using Fe(III)-citrate as sole carbon and energy source in anaerobiosis and tolerating diverse metals coincides with the presence at the genomic level of specific genes that can support i) energy metabolism optimization, ii) cell protection by the biosynthesis of a peculiar exopolysaccharide armour entrapping metal ions and iii) general and metal-specific detoxifying activities by different proteins and metabolites.},
}

RevDate: 2018-11-22

Abudahab K, Prada JM, Yang Z, et al (2018)

PANINI: Pangenome Neighbour Identification for Bacterial Populations.

Microbial genomics [Epub ahead of print].

The standard workhorse for genomic analysis of the evolution of bacterial populations is phylogenetic modelling of mutations in the core genome. However, a notable amount of information about evolutionary and transmission processes in diverse populations can be lost unless the accessory genome is also taken into consideration. Here, we introduce panini (Pangenome Neighbour Identification for Bacterial Populations), a computationally scalable method for identifying the neighbours for each isolate in a data set using unsupervised machine learning with stochastic neighbour embedding based on the t-SNE (t-distributed stochastic neighbour embedding) algorithm. panini is browser-based and integrates with the Microreact platform for rapid online visualization and exploration of both core and accessory genome evolutionary signals, together with relevant epidemiological, geographical, temporal and other metadata. Several case studies with single- and multi-clone pneumococcal populations are presented to demonstrate the ability to identify biologically important signals from gene content data. panini is available at http://panini.pathogen.watch and code at http://gitlab.com/cgps/panini.

@article {pmid30465642,
year = {2018},
author = {Abudahab, K and Prada, JM and Yang, Z and Bentley, SD and Croucher, NJ and Corander, J and Aanensen, DM},
title = {PANINI: Pangenome Neighbour Identification for Bacterial Populations.},
journal = {Microbial genomics},
volume = {},
number = {},
pages = {},
doi = {10.1099/mgen.0.000220},
pmid = {30465642},
issn = {2057-5858},
abstract = {The standard workhorse for genomic analysis of the evolution of bacterial populations is phylogenetic modelling of mutations in the core genome. However, a notable amount of information about evolutionary and transmission processes in diverse populations can be lost unless the accessory genome is also taken into consideration. Here, we introduce panini (Pangenome Neighbour Identification for Bacterial Populations), a computationally scalable method for identifying the neighbours for each isolate in a data set using unsupervised machine learning with stochastic neighbour embedding based on the t-SNE (t-distributed stochastic neighbour embedding) algorithm. panini is browser-based and integrates with the Microreact platform for rapid online visualization and exploration of both core and accessory genome evolutionary signals, together with relevant epidemiological, geographical, temporal and other metadata. Several case studies with single- and multi-clone pneumococcal populations are presented to demonstrate the ability to identify biologically important signals from gene content data. panini is available at http://panini.pathogen.watch and code at http://gitlab.com/cgps/panini.},
}

BACKGROUND: Pseudomonas aeruginosa is a common bacterium which is recognized for its association with hospital-acquired infections and its advanced antibiotic resistance mechanisms. Tuberculosis, one of the major causes of mortality, is initiated by the deposition of Mycobacterium tuberculosis. Accessory sequences shared by a subset of strains of a species play an important role in a species' evolution, antibiotic resistance and infectious potential.

RESULTS: Here, with a multiple sequence aligner, we segmented 25 P. aeruginosa genomes and 28 M. tuberculosis genomes into core blocks (include sequences shared by all the input genomes) and dispensable blocks (include sequences shared by a subset of the input genomes), respectively. For each input genome, we then constructed a scaffold consisting of its core and dispensable blocks sorted by blocks' locations on the chromosomes. Consecutive dispensable blocks on these scaffold formed instable regions. After a comprehensive study of these instable regions, three characteristics of instable regions are summarized: instable regions were short, site specific and varied in different strains. Three DNA elements (directed repeats (DRs), transposons and integrons) were then studied to see whether these DNA elements are associated with the variation of instable regions. A pipeline was developed to search for DR pairs on the flank of every instable sequence. 27 DR pairs in P. aeruginosa strains and 6 pairs in M. tuberculosis strains were found to exist in the instable regions. On the average, 14% and 12% of instable regions in P. aeruginosa strains covered transposase genes and integrase genes, respectively. In M. tuberculosis strains, an average of 43% and 8% of instable regions contain transposase genes and integrase genes, respectively.

CONCLUSIONS: Instable regions were short, site specific and varied in different strains for both P. aeruginosa and M. tuberculosis. Our experimental results showed that DRs, transposons and integrons may be associated with variation of instable regions.

@article {pmid30458797,
year = {2018},
author = {Wang, D and Li, J and Wang, L},
title = {Comprehensive study of instable regions in Pseudomonas aeruginosa and Mycobacterium tuberculosis.},
journal = {Biomedical engineering online},
volume = {17},
number = {Suppl 1},
pages = {133},
doi = {10.1186/s12938-018-0563-8},
pmid = {30458797},
issn = {1475-925X},
abstract = {BACKGROUND: Pseudomonas aeruginosa is a common bacterium which is recognized for its association with hospital-acquired infections and its advanced antibiotic resistance mechanisms. Tuberculosis, one of the major causes of mortality, is initiated by the deposition of Mycobacterium tuberculosis. Accessory sequences shared by a subset of strains of a species play an important role in a species' evolution, antibiotic resistance and infectious potential.

RESULTS: Here, with a multiple sequence aligner, we segmented 25 P. aeruginosa genomes and 28 M. tuberculosis genomes into core blocks (include sequences shared by all the input genomes) and dispensable blocks (include sequences shared by a subset of the input genomes), respectively. For each input genome, we then constructed a scaffold consisting of its core and dispensable blocks sorted by blocks' locations on the chromosomes. Consecutive dispensable blocks on these scaffold formed instable regions. After a comprehensive study of these instable regions, three characteristics of instable regions are summarized: instable regions were short, site specific and varied in different strains. Three DNA elements (directed repeats (DRs), transposons and integrons) were then studied to see whether these DNA elements are associated with the variation of instable regions. A pipeline was developed to search for DR pairs on the flank of every instable sequence. 27 DR pairs in P. aeruginosa strains and 6 pairs in M. tuberculosis strains were found to exist in the instable regions. On the average, 14% and 12% of instable regions in P. aeruginosa strains covered transposase genes and integrase genes, respectively. In M. tuberculosis strains, an average of 43% and 8% of instable regions contain transposase genes and integrase genes, respectively.

CONCLUSIONS: Instable regions were short, site specific and varied in different strains for both P. aeruginosa and M. tuberculosis. Our experimental results showed that DRs, transposons and integrons may be associated with variation of instable regions.},
}

RevDate: 2018-11-24

Sherman RM, Forman J, Antonescu V, et al (2018)

Assembly of a pan-genome from deep sequencing of 910 humans of African descent.

Nature genetics pii:10.1038/s41588-018-0273-y [Epub ahead of print].

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.

@article {pmid30455414,
year = {2018},
author = {Sherman, RM and Forman, J and Antonescu, V and Puiu, D and Daya, M and Rafaels, N and Boorgula, MP and Chavan, S and Vergara, C and Ortega, VE and Levin, AM and Eng, C and Yazdanbakhsh, M and Wilson, JG and Marrugo, J and Lange, LA and Williams, LK and Watson, H and Ware, LB and Olopade, CO and Olopade, O and Oliveira, RR and Ober, C and Nicolae, DL and Meyers, DA and Mayorga, A and Knight-Madden, J and Hartert, T and Hansel, NN and Foreman, MG and Ford, JG and Faruque, MU and Dunston, GM and Caraballo, L and Burchard, EG and Bleecker, ER and Araujo, MI and Herrera-Paz, EF and Campbell, M and Foster, C and Taub, MA and Beaty, TH and Ruczinski, I and Mathias, RA and Barnes, KC and Salzberg, SL},
title = {Assembly of a pan-genome from deep sequencing of 910 humans of African descent.},
journal = {Nature genetics},
volume = {},
number = {},
pages = {},
doi = {10.1038/s41588-018-0273-y},
pmid = {30455414},
issn = {1546-1718},
support = {R01 HG006677/HG/NHGRI NIH HHS/United States ; R01 HL129239/HL/NHLBI NIH HHS/United States ; },
abstract = {We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.},
}

The genomes of Streptococcus agalactiae (group B streptococcus; GBS) collected from diseased fish in Thailand and Vietnam over a nine-year period (2008-2016) were sequenced and compared (n = 21). Based on capsular serotype and multilocus sequence typing (MLST), GBS isolates are divided into 2 groups comprised of i) serotype Ia; sequence type (ST)7 and ii) serotype III; ST283. Population structure inferred by core genome (cg)MLST and Bayesian clustering analysis also strongly indicated distribution of two GBS populations in both Thailand and Vietnam. Deep phylogenetic analysis implied by CRISPR array's spacer diversity was able to cluster GBS isolates according to their temporal and geographic origins, though ST7 has varying CRISPR1-spacer profiles when compared to ST283 strains. Based on overall genotypic features, Thai ST283 strains were closely related to the Singaporean ST283 strain causing foodborne illness in humans in 2015, thus, signifying zoonotic potential of this GBS population in the country.

The concept of a pan-genome refers to intraspecific diversity in genome content and structure, encompassing both genes and intergenic space. Pan-genomic studies employ a combination of de novo sequence assembly and reference-based alignment to discover and genotype structural variants. The large size and complex structure of Triticeae genomes were for a long time an obstacle for genomic research in barley and its relatives. Now that a reference genome is available, computational pipelines for high-quality sequence assembly are in place, and sequence costs continue to drop, investigations into the structural diversity of the barley genome seem within reach. Here, we review the recent progress on pan-genomics in the model grass Brachypodium distachyon, and the cereal crops rice and maize, and devise a multi-tiered strategy for a pan-genome project in barley. Our design involves: (1) the construction of high-quality de novo sequence assemblies for a small core set of representative genotypes, (2) short-read sequencing of a large diversity panel of genebank accessions to medium coverage and (3) the use of complementary methods such as chromosome-conformation capture sequencing and k-mer-based association genetics. The in silico representation of the barley pan-genome may inform about the mechanisms of structural genome evolution in the Triticeae and supplement quantitative genetics models of crop performance for better accuracy and predictive ability.

@article {pmid30446793,
year = {2018},
author = {Monat, C and Schreiber, M and Stein, N and Mascher, M},
title = {Prospects of pan-genomics in barley.},
journal = {TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik},
volume = {},
number = {},
pages = {},
doi = {10.1007/s00122-018-3234-z},
pmid = {30446793},
issn = {1432-2242},
support = {031B0190A//Bundesministerium für Bildung und Forschung/ ; SAW-2015-IPK-1//Leibniz-Gemeinschaft/ ; },
abstract = {The concept of a pan-genome refers to intraspecific diversity in genome content and structure, encompassing both genes and intergenic space. Pan-genomic studies employ a combination of de novo sequence assembly and reference-based alignment to discover and genotype structural variants. The large size and complex structure of Triticeae genomes were for a long time an obstacle for genomic research in barley and its relatives. Now that a reference genome is available, computational pipelines for high-quality sequence assembly are in place, and sequence costs continue to drop, investigations into the structural diversity of the barley genome seem within reach. Here, we review the recent progress on pan-genomics in the model grass Brachypodium distachyon, and the cereal crops rice and maize, and devise a multi-tiered strategy for a pan-genome project in barley. Our design involves: (1) the construction of high-quality de novo sequence assemblies for a small core set of representative genotypes, (2) short-read sequencing of a large diversity panel of genebank accessions to medium coverage and (3) the use of complementary methods such as chromosome-conformation capture sequencing and k-mer-based association genetics. The in silico representation of the barley pan-genome may inform about the mechanisms of structural genome evolution in the Triticeae and supplement quantitative genetics models of crop performance for better accuracy and predictive ability.},
}

Lactobacillus casei/Lactobacillus paracasei group of species contains strains adapted to a wide range of environments, from dairy products to intestinal tract of animals and fermented vegetables. Understanding the gene acquisitions and losses that induced such different adaptations, implies a comparison between complete genomes, since evolutionary differences spread on the whole sequence. This study compared 12 complete genomes of L. casei/paracasei dairy-niche isolates and 7 genomes of L. casei/paracasei isolated from other habitats (i.e., corn silage, human intestine, sauerkraut, beef, congee). Phylogenetic tree construction and average nucleotide identity (ANI) metric showed a clustering of the two dairy L. casei strains ATCC393 and LC5, indicating a lower genetic relatedness in comparison to the other strains. Genomic analysis revealed a core of 313 genes shared by dairy and non-dairy Lactic Acid bacteria (LAB), within a pan-genome of 9,462 genes. Functional category analyses highlighted the evolutionary genes decay of dairy isolates, particularly considering carbohydrates and amino acids metabolisms. Specifically, dairy L. casei/paracasei strains lost the ability to metabolize myo-inositol and taurine (i.e., iol and tau gene clusters). However, gene acquisitions by dairy strains were also highlighted, mostly related to defense mechanisms and host-pathogen interactions (i.e., yueB, esaA, and sle1). This study aimed to be a preliminary investigation on dairy and non-dairy marker genes that could be further characterized for probiotics or food applications.

The Gram positive bacterium Streptococcus pneumoniae (pneumococcus) is a major human pathogen. It is a common colonizer of the human host, and in the nasopharynx, sinus, and middle ear it survives as a biofilm. This mode of growth is optimal for multi-strain colonization and genetic exchange. Over the last decades, the far-reaching use of antibiotics and the widespread implementation of pneumococcal multivalent conjugate vaccines have posed considerable selective pressure on pneumococci. This scenario provides an exceptional opportunity to study the evolution of the pangenome of a clinically important bacterium, and has the potential to serve as a case study for other species. The goal of this review is to highlight key findings in the studies of pneumococcal genomic diversity and plasticity.

@article {pmid30425695,
year = {2018},
author = {Hiller, NL and Sá-Leão, R},
title = {Puzzling Over the Pneumococcal Pangenome.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {2580},
doi = {10.3389/fmicb.2018.02580},
pmid = {30425695},
issn = {1664-302X},
abstract = {The Gram positive bacterium Streptococcus pneumoniae (pneumococcus) is a major human pathogen. It is a common colonizer of the human host, and in the nasopharynx, sinus, and middle ear it survives as a biofilm. This mode of growth is optimal for multi-strain colonization and genetic exchange. Over the last decades, the far-reaching use of antibiotics and the widespread implementation of pneumococcal multivalent conjugate vaccines have posed considerable selective pressure on pneumococci. This scenario provides an exceptional opportunity to study the evolution of the pangenome of a clinically important bacterium, and has the potential to serve as a case study for other species. The goal of this review is to highlight key findings in the studies of pneumococcal genomic diversity and plasticity.},
}

Plant-beneficial Pseudomonas spp. competitively colonize the rhizosphere and display plant-growth promotion and/or disease-suppression activities. Some strains within the P. fluorescens species complex produce phenazine derivatives, such as phenazine-1-carboxylic acid. These antimicrobial compounds are broadly inhibitory to numerous soil-dwelling plant pathogens and play a role in the ecological competence of phenazine-producing Pseudomonas spp. We assembled a collection encompassing 63 strains representative of the worldwide diversity of plant-beneficial phenazine-producing Pseudomonas spp. In this study, we report the sequencing of 58 complete genomes using PacBio RS II sequencing technology. Distributed among four subgroups within the P. fluorescens species complex, the diversity of our collection is reflected by the large pangenome which accounts for 25,413 protein-coding genes. We identified genes and clusters encoding for numerous phytobeneficial traits, including antibiotics, siderophores and cyclic lipopeptides biosynthesis, some of which were previously unknown in these microorganisms. Finally, we gained insight into the evolutionary history of the phenazine biosynthetic operon. Given its diverse genomic context, it is likely that this operon was relocated several times during Pseudomonas evolution. Our findings acknowledge the tremendous diversity of plant-beneficial phenazine-producing Pseudomonas spp., paving the way for comparative analyses to identify new genetic determinants involved in biocontrol, plant-growth promotion and rhizosphere competence. This article is protected by copyright. All rights reserved.

@article {pmid30421490,
year = {2018},
author = {Biessy, A and Novinscak, A and Blom, J and Léger, G and Thomashow, LS and Cazorla, FM and Josic, D and Filion, M},
title = {Diversity of phytobeneficial traits revealed by whole-genome analysis of worldwide-isolated phenazine-producing Pseudomonas spp.},
journal = {Environmental microbiology},
volume = {},
number = {},
pages = {},
doi = {10.1111/1462-2920.14476},
pmid = {30421490},
issn = {1462-2920},
abstract = {Plant-beneficial Pseudomonas spp. competitively colonize the rhizosphere and display plant-growth promotion and/or disease-suppression activities. Some strains within the P. fluorescens species complex produce phenazine derivatives, such as phenazine-1-carboxylic acid. These antimicrobial compounds are broadly inhibitory to numerous soil-dwelling plant pathogens and play a role in the ecological competence of phenazine-producing Pseudomonas spp. We assembled a collection encompassing 63 strains representative of the worldwide diversity of plant-beneficial phenazine-producing Pseudomonas spp. In this study, we report the sequencing of 58 complete genomes using PacBio RS II sequencing technology. Distributed among four subgroups within the P. fluorescens species complex, the diversity of our collection is reflected by the large pangenome which accounts for 25,413 protein-coding genes. We identified genes and clusters encoding for numerous phytobeneficial traits, including antibiotics, siderophores and cyclic lipopeptides biosynthesis, some of which were previously unknown in these microorganisms. Finally, we gained insight into the evolutionary history of the phenazine biosynthetic operon. Given its diverse genomic context, it is likely that this operon was relocated several times during Pseudomonas evolution. Our findings acknowledge the tremendous diversity of plant-beneficial phenazine-producing Pseudomonas spp., paving the way for comparative analyses to identify new genetic determinants involved in biocontrol, plant-growth promotion and rhizosphere competence. This article is protected by copyright. All rights reserved.},
}

RevDate: 2018-11-23

Nanayakkara BS, O'Brien CL, DM Gordon (2018)

Diversity and distribution of Klebsiella capsules in Escherichia coli.

Environmental microbiology reports [Epub ahead of print].

E. coli strains responsible for elevated counts (blooms) in freshwater reservoirs in Australia carry a capsule originating from Klebsiella. The occurrence of Klebsiella capsules in E. coli was about 7% overall and 23 different capsule types were detected. Capsules were observed in strains from phylogroups A, B1 and C, but were absent from phylogroup B2, D, E and F strains. In general, few A, B1 or C lineages were capsule-positive, but when a lineage was encapsulated multiple different capsule types were present. All Klebsiella capsule-positive strains were of serogroups O8, O9 and O89. Regardless of the phylogroup, O9 strains were more likely to be capsule-positive than O8 strains. Given the sequence similarity, it appears that both the capsule region and the O-antigen gene region are transferred to E. coli from Klebsiella as a single block via horizontal gene transfer events. Pan genome analysis indicated that there were only modest differences between encapsulated and non-encapsulated strains belonging to phylogroup A. The possession of a Klebsiella capsule, but not the type of capsule, is likely a key determinant of the bloom status of a strain.

@article {pmid30411512,
year = {2018},
author = {Nanayakkara, BS and O'Brien, CL and Gordon, DM},
title = {Diversity and distribution of Klebsiella capsules in Escherichia coli.},
journal = {Environmental microbiology reports},
volume = {},
number = {},
pages = {},
doi = {10.1111/1758-2229.12710},
pmid = {30411512},
issn = {1758-2229},
support = {LP120100327//Australian Research Council/ ; //Water Research Australia/ ; },
abstract = {E. coli strains responsible for elevated counts (blooms) in freshwater reservoirs in Australia carry a capsule originating from Klebsiella. The occurrence of Klebsiella capsules in E. coli was about 7% overall and 23 different capsule types were detected. Capsules were observed in strains from phylogroups A, B1 and C, but were absent from phylogroup B2, D, E and F strains. In general, few A, B1 or C lineages were capsule-positive, but when a lineage was encapsulated multiple different capsule types were present. All Klebsiella capsule-positive strains were of serogroups O8, O9 and O89. Regardless of the phylogroup, O9 strains were more likely to be capsule-positive than O8 strains. Given the sequence similarity, it appears that both the capsule region and the O-antigen gene region are transferred to E. coli from Klebsiella as a single block via horizontal gene transfer events. Pan genome analysis indicated that there were only modest differences between encapsulated and non-encapsulated strains belonging to phylogroup A. The possession of a Klebsiella capsule, but not the type of capsule, is likely a key determinant of the bloom status of a strain.},
}

RevDate: 2018-11-14

Al-Bassam MM, Haist J, Neumann SA, et al (2018)

Expression Patterns, Genomic Conservation and Input Into Developmental Regulation of the GGDEF/EAL/HD-GYP Domain Proteins in Streptomyces.

Frontiers in microbiology, 9:2524.

To proliferate, antibiotic-producing Streptomyces undergo a complex developmental transition from vegetative growth to the production of aerial hyphae and spores. This morphological switch is controlled by the signaling molecule cyclic bis-(3',5') di-guanosine-mono-phosphate (c-di-GMP) that binds to the master developmental regulator, BldD, leading to repression of key sporulation genes during vegetative growth. However, a systematical analysis of all the GGDEF/EAL/HD-GYP proteins that control c-di-GMP levels in Streptomyces is still lacking. Here, we have FLAG-tagged all 10 c-di-GMP turnover proteins in Streptomyces venezuelae and characterized their expression patterns throughout the life cycle, revealing that the diguanylate cyclase (DGC) CdgB and the phosphodiesterase (PDE) RmdB are the most abundant GGDEF/EAL proteins. Moreover, we have deleted all the genes coding for c-di-GMP turnover enzymes individually and analyzed morphogenesis of the mutants in macrocolonies. We show that the composite GGDEF-EAL protein CdgC is an active DGC and that deletion of the DGCs cdgB and cdgC enhance sporulation whereas deletion of the PDEs rmdA and rmdB delay development in S. venezuelae. By comparing the pan genome of 93 fully sequenced Streptomyces species we show that the DGCs CdgA, CdgB, and CdgC, and the PDE RmdB represent the most conserved c-di-GMP-signaling proteins in the genus Streptomyces.

@article {pmid30405580,
year = {2018},
author = {Al-Bassam, MM and Haist, J and Neumann, SA and Lindenberg, S and Tschowri, N},
title = {Expression Patterns, Genomic Conservation and Input Into Developmental Regulation of the GGDEF/EAL/HD-GYP Domain Proteins in Streptomyces.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {2524},
doi = {10.3389/fmicb.2018.02524},
pmid = {30405580},
issn = {1664-302X},
abstract = {To proliferate, antibiotic-producing Streptomyces undergo a complex developmental transition from vegetative growth to the production of aerial hyphae and spores. This morphological switch is controlled by the signaling molecule cyclic bis-(3',5') di-guanosine-mono-phosphate (c-di-GMP) that binds to the master developmental regulator, BldD, leading to repression of key sporulation genes during vegetative growth. However, a systematical analysis of all the GGDEF/EAL/HD-GYP proteins that control c-di-GMP levels in Streptomyces is still lacking. Here, we have FLAG-tagged all 10 c-di-GMP turnover proteins in Streptomyces venezuelae and characterized their expression patterns throughout the life cycle, revealing that the diguanylate cyclase (DGC) CdgB and the phosphodiesterase (PDE) RmdB are the most abundant GGDEF/EAL proteins. Moreover, we have deleted all the genes coding for c-di-GMP turnover enzymes individually and analyzed morphogenesis of the mutants in macrocolonies. We show that the composite GGDEF-EAL protein CdgC is an active DGC and that deletion of the DGCs cdgB and cdgC enhance sporulation whereas deletion of the PDEs rmdA and rmdB delay development in S. venezuelae. By comparing the pan genome of 93 fully sequenced Streptomyces species we show that the DGCs CdgA, CdgB, and CdgC, and the PDE RmdB represent the most conserved c-di-GMP-signaling proteins in the genus Streptomyces.},
}

RevDate: 2018-11-29

Pinto M, González-Díaz A, Machado MP, et al (2018)

Insights into the population structure and pan-genome of Haemophilus influenzae.

The human-restricted bacterium Haemophilus influenzae is responsible for respiratory infections in both children and adults. While colonization begins in the upper airways, it can spread throughout the respiratory tract potentially leading to invasive infections. Although the spread of H. influenzae serotype b (Hib) has been prevented by vaccination, the emergence of infections by other serotypes as well as by non-typeable isolates (NTHi) have been observed, prompting the need for novel prevention strategies. Here, we aimed to study the population structure of H. influenzae and to get some insights into its pan-genome. We studied 305H. influenzae strains, enrolling 217 publicly available genomes, as well as 88 newly sequenced H. influenzae invasive strains isolated in Portugal, spanning a 24-year period. NTHi isolates presented a core-SNP-based genetic diversity about 10-fold higher than the one observed for Hib. The analysis of key factors involved in pathogenesis, such as lipooligosaccharides, hemagglutinating pili and High Molecular Weight-adhesins, suggests that NTHi shape its virulence repertoire, either by acquisition and loss of genes or by SNP-based diversification, likely towards host immune evasion and persistence. Discreet NTHi subpopulations structures are proposed based on core-genome supported with 17 candidate genetic markers identified in the accessory genome. Additionally, this study provides two bioinformatics tools for in silico rapid identification of H. influenzae serotypes and NTHi clades previously proposed, obviating laboratory-based demanding procedures. The present study constitutes an important genomic framework that could lay way for future studies on the genetic determinants underlying invasiveness and disease and population structure of H. influenzae.

@article {pmid30391557,
year = {2018},
author = {Pinto, M and González-Díaz, A and Machado, MP and Duarte, S and Vieira, L and Carriço, JA and Marti, S and Bajanca-Lavado, MP and Gomes, JP},
title = {Insights into the population structure and pan-genome of Haemophilus influenzae.},
journal = {Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases},
volume = {67},
number = {},
pages = {126-135},
doi = {10.1016/j.meegid.2018.10.025},
pmid = {30391557},
issn = {1567-7257},
abstract = {The human-restricted bacterium Haemophilus influenzae is responsible for respiratory infections in both children and adults. While colonization begins in the upper airways, it can spread throughout the respiratory tract potentially leading to invasive infections. Although the spread of H. influenzae serotype b (Hib) has been prevented by vaccination, the emergence of infections by other serotypes as well as by non-typeable isolates (NTHi) have been observed, prompting the need for novel prevention strategies. Here, we aimed to study the population structure of H. influenzae and to get some insights into its pan-genome. We studied 305H. influenzae strains, enrolling 217 publicly available genomes, as well as 88 newly sequenced H. influenzae invasive strains isolated in Portugal, spanning a 24-year period. NTHi isolates presented a core-SNP-based genetic diversity about 10-fold higher than the one observed for Hib. The analysis of key factors involved in pathogenesis, such as lipooligosaccharides, hemagglutinating pili and High Molecular Weight-adhesins, suggests that NTHi shape its virulence repertoire, either by acquisition and loss of genes or by SNP-based diversification, likely towards host immune evasion and persistence. Discreet NTHi subpopulations structures are proposed based on core-genome supported with 17 candidate genetic markers identified in the accessory genome. Additionally, this study provides two bioinformatics tools for in silico rapid identification of H. influenzae serotypes and NTHi clades previously proposed, obviating laboratory-based demanding procedures. The present study constitutes an important genomic framework that could lay way for future studies on the genetic determinants underlying invasiveness and disease and population structure of H. influenzae.},
}

RevDate: 2018-11-14

Wüthrich D, Irmler S, Berthoud H, et al (2018)

Conversion of Methionine to Cysteine in Lactobacillus paracasei Depends on the Highly Mobile cysK-ctl-cysE Gene Cluster.

Frontiers in microbiology, 9:2415.

Milk and dairy products are rich in nutrients and are therefore habitats for various microbiomes. However, the composition of nutrients can be quite diverse, in particular among the sulfur containing amino acids. In milk, methionine is present in a 25-fold higher abundance than cysteine. Interestingly, a fraction of strains of the species L. paracasei - a flavor-enhancing adjunct culture species - can grow in medium with methionine as the sole sulfur source. In this study, we focus on genomic and evolutionary aspects of sulfur dependence in L. paracasei strains. From 24 selected L. paracasei strains, 16 strains can grow in medium with methionine as sole sulfur source. We sequenced these strains to perform gene-trait matching. We found that one gene cluster - consisting of a cysteine synthase, a cystathionine lyase, and a serine acetyltransferase - is present in all strains that grow in medium with methionine as sole sulfur source. In contrast, strains that depend on other sulfur sources do not have this gene cluster. We expanded the study and searched for this gene cluster in other species and detected it in the genomes of many bacteria species used in the food production. The comparison to these species showed that two different versions of the gene cluster exist in L. paracasei which were likely gained in two distinct events of horizontal gene transfer. Additionally, the comparison of 62 L. paracasei genomes and the two versions of the gene cluster revealed that this gene cluster is mobile within the species.

@article {pmid30386310,
year = {2018},
author = {Wüthrich, D and Irmler, S and Berthoud, H and Guggenbühl, B and Eugster, E and Bruggmann, R},
title = {Conversion of Methionine to Cysteine in Lactobacillus paracasei Depends on the Highly Mobile cysK-ctl-cysE Gene Cluster.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {2415},
doi = {10.3389/fmicb.2018.02415},
pmid = {30386310},
issn = {1664-302X},
abstract = {Milk and dairy products are rich in nutrients and are therefore habitats for various microbiomes. However, the composition of nutrients can be quite diverse, in particular among the sulfur containing amino acids. In milk, methionine is present in a 25-fold higher abundance than cysteine. Interestingly, a fraction of strains of the species L. paracasei - a flavor-enhancing adjunct culture species - can grow in medium with methionine as the sole sulfur source. In this study, we focus on genomic and evolutionary aspects of sulfur dependence in L. paracasei strains. From 24 selected L. paracasei strains, 16 strains can grow in medium with methionine as sole sulfur source. We sequenced these strains to perform gene-trait matching. We found that one gene cluster - consisting of a cysteine synthase, a cystathionine lyase, and a serine acetyltransferase - is present in all strains that grow in medium with methionine as sole sulfur source. In contrast, strains that depend on other sulfur sources do not have this gene cluster. We expanded the study and searched for this gene cluster in other species and detected it in the genomes of many bacteria species used in the food production. The comparison to these species showed that two different versions of the gene cluster exist in L. paracasei which were likely gained in two distinct events of horizontal gene transfer. Additionally, the comparison of 62 L. paracasei genomes and the two versions of the gene cluster revealed that this gene cluster is mobile within the species.},
}

Clinical infectious diseases : an official publication of the Infectious Diseases Society of America pii:5146342 [Epub ahead of print].

Background: Shiga toxin-producing Escherchia coli O157:H7 is a zoonotic pathogen which causes numerous food and waterborne disease outbreaks. It is globally distributed but its origin and temporal sequence of geographical spread is unknown.

Methods: We analysed Whole Genome Sequencing data of 757 isolates from 4 continents and performed a pan genome analysis to identify the core genome and from this extracted single nucleotide polymorphisms. Timed phylogeographic analysis was performed on a subset of the isolates to investigate it's worldwide spread.

Results: The common ancestor of this set of isolates occurred around 1890 (1845-1925) and originated from the Netherlands. Phylogeographic analysis identified 34 major transmission events. The earliest were predominantly intercontinental from Europe to Australia around 1937 (1909-1958), to USA in 1941 (1921-1962), to Canada in 1960 (1943-1979), and from Australia to New Zealand in 1966 (1943-1982). This pre-dates the first reported human case of E. coli O157:H7 in 1975 from the USA.

Conclusions: Inter- and intra- continental transmission events have resulted in the current international distribution of E. coli O157:H7 and it is likely that these events were facilitated by animal movements (e.g. Holstein Friesian cattle). These findings will inform policy on action that is crucial to reduce further spread of E. coli O157:H7 and other (emerging) STEC strains globally.

Methods: We analysed Whole Genome Sequencing data of 757 isolates from 4 continents and performed a pan genome analysis to identify the core genome and from this extracted single nucleotide polymorphisms. Timed phylogeographic analysis was performed on a subset of the isolates to investigate it's worldwide spread.

Results: The common ancestor of this set of isolates occurred around 1890 (1845-1925) and originated from the Netherlands. Phylogeographic analysis identified 34 major transmission events. The earliest were predominantly intercontinental from Europe to Australia around 1937 (1909-1958), to USA in 1941 (1921-1962), to Canada in 1960 (1943-1979), and from Australia to New Zealand in 1966 (1943-1982). This pre-dates the first reported human case of E. coli O157:H7 in 1975 from the USA.

Conclusions: Inter- and intra- continental transmission events have resulted in the current international distribution of E. coli O157:H7 and it is likely that these events were facilitated by animal movements (e.g. Holstein Friesian cattle). These findings will inform policy on action that is crucial to reduce further spread of E. coli O157:H7 and other (emerging) STEC strains globally.},
}

Microbial spoilage of raw meat causes huge economic losses every year. Understanding the microbial ecology associated to the spoilage and its dynamics during refrigerated storage of meat can help in preventing and delaying the spoilage-related activities. Raw meat microbiota is usually complex but only few members will develop during storage and cause spoilage, upon the pressure of several external factors, such as temperature and oxygen availability. We characterized the metagenome of beef packed aerobically or under-vacuum during refrigerated storage to explore how different packaging conditions may influence microbial composition and potential spoilage-associated activities. Different population dynamics and spoilage-associated genomic repertoires occurred in beef stored in air or vacuum-packaging. Moreover, pangenomics of Pseudomonas fragi strains extracted from metagenomes was carried out. We demonstrated the presence of specific, storage-driven strain-level profiles of Pseudomonas fragi, characterized by a different gene repertoire, thus potentially able to act differently during meat spoilage. The results provide new knowledge on strain-level microbial ecology associated to meat spoilage and can be of value for future strategies of spoilage prevention and food waste reduction.IMPORTANCE This work provides insights on the mechanisms involved in raw beef spoilage during refrigerated storage and on the selective pressure exerted by the packaging conditions. We highlighted the presence of different microbial metagenomes during spoilage of beef packaged aerobically or under-vacuum. The packaging condition was able to select specific Pseudomonas fragi strains, with a distinctive genomic repertoire. This study may help in deciphering the behaviour of different biomes directly in-situ in food and in understanding the specific contribution of different strains to food spoilage.

@article {pmid30366996,
year = {2018},
author = {De Filippis, F and La Storia, A and Villani, F and Ercolini, D},
title = {Strain-level diversity analysis of Pseudomonas fragi after in situ pangenome reconstruction shows distinctive spoilage-associated metabolic traits clearly selected by different storage conditions.},
journal = {Applied and environmental microbiology},
volume = {},
number = {},
pages = {},
doi = {10.1128/AEM.02212-18},
pmid = {30366996},
issn = {1098-5336},
abstract = {Microbial spoilage of raw meat causes huge economic losses every year. Understanding the microbial ecology associated to the spoilage and its dynamics during refrigerated storage of meat can help in preventing and delaying the spoilage-related activities. Raw meat microbiota is usually complex but only few members will develop during storage and cause spoilage, upon the pressure of several external factors, such as temperature and oxygen availability. We characterized the metagenome of beef packed aerobically or under-vacuum during refrigerated storage to explore how different packaging conditions may influence microbial composition and potential spoilage-associated activities. Different population dynamics and spoilage-associated genomic repertoires occurred in beef stored in air or vacuum-packaging. Moreover, pangenomics of Pseudomonas fragi strains extracted from metagenomes was carried out. We demonstrated the presence of specific, storage-driven strain-level profiles of Pseudomonas fragi, characterized by a different gene repertoire, thus potentially able to act differently during meat spoilage. The results provide new knowledge on strain-level microbial ecology associated to meat spoilage and can be of value for future strategies of spoilage prevention and food waste reduction.IMPORTANCE This work provides insights on the mechanisms involved in raw beef spoilage during refrigerated storage and on the selective pressure exerted by the packaging conditions. We highlighted the presence of different microbial metagenomes during spoilage of beef packaged aerobically or under-vacuum. The packaging condition was able to select specific Pseudomonas fragi strains, with a distinctive genomic repertoire. This study may help in deciphering the behaviour of different biomes directly in-situ in food and in understanding the specific contribution of different strains to food spoilage.},
}

RevDate: 2018-11-14

Hii SYF, Ahmad N, Hashim R, et al (2018)

A SNP-based phylogenetic analysis of Corynebacterium diphtheriae in Malaysia.

BMC research notes, 11(1):760 pii:10.1186/s13104-018-3868-6.

OBJECTIVE: There is a lack of study in Corynebacterium diphtheriae isolates in Malaysia. The alarming surge of cases in year 2016 lead us to evaluate the local clinical C. diphtheriae strains in Malaysia. We conducted single nucleotide polymorphism phylogenetic analysis on the core and pan-genome as well as toxin and diphtheria toxin repressor (DtxR) genes of Malaysian C. diphtheriae isolates from the year 1986-2016.

RESULTS: The comparison between core and pan-genomic comparison showed variation in the distribution of C. diphtheriae. The local isolates portrayed a heterogenous trait and a close relationship between Malaysia's and Belarus's, Africa's and India's strains were observed. A toxigenic C. diphtheriae clone was noted to be circulating in the Malaysian population for nearly 30 years and from our study, the non-toxigenic and toxigenic C. diphtheriae strains can be differentiated significantly into two large clusters, A and B respectively. Analysis against vaccine strain, PW8 portrayed that the amino acid composition of toxin and DtxR in Malaysia's local strains are well-conserved and there was no functional defect noted. Hence, the change in efficacy of the currently used toxoid vaccine is unlikely to occur.

@article {pmid30359301,
year = {2018},
author = {Hii, SYF and Ahmad, N and Hashim, R and Liow, YL and Abd Wahab, MA and Mohd Khalid, MKN},
title = {A SNP-based phylogenetic analysis of Corynebacterium diphtheriae in Malaysia.},
journal = {BMC research notes},
volume = {11},
number = {1},
pages = {760},
doi = {10.1186/s13104-018-3868-6},
pmid = {30359301},
issn = {1756-0500},
support = {NMRR id: 16-1421-32070 (JPP-IMR: 16-056)//Ministry of Health Malaysia/ ; },
abstract = {OBJECTIVE: There is a lack of study in Corynebacterium diphtheriae isolates in Malaysia. The alarming surge of cases in year 2016 lead us to evaluate the local clinical C. diphtheriae strains in Malaysia. We conducted single nucleotide polymorphism phylogenetic analysis on the core and pan-genome as well as toxin and diphtheria toxin repressor (DtxR) genes of Malaysian C. diphtheriae isolates from the year 1986-2016.

RESULTS: The comparison between core and pan-genomic comparison showed variation in the distribution of C. diphtheriae. The local isolates portrayed a heterogenous trait and a close relationship between Malaysia's and Belarus's, Africa's and India's strains were observed. A toxigenic C. diphtheriae clone was noted to be circulating in the Malaysian population for nearly 30 years and from our study, the non-toxigenic and toxigenic C. diphtheriae strains can be differentiated significantly into two large clusters, A and B respectively. Analysis against vaccine strain, PW8 portrayed that the amino acid composition of toxin and DtxR in Malaysia's local strains are well-conserved and there was no functional defect noted. Hence, the change in efficacy of the currently used toxoid vaccine is unlikely to occur.},
}

The large and complex genome of Pseudomonas aeruginosa, which consists of significant portions (up to 20%) of transferable genetic elements contributes to the rapid development of antibiotic resistance. The whole genome sequences of 22 strains isolated from eye and cystic fibrosis patients in Australia and India between 1992 and 2007 were used to compare genomic divergence and phylogenetic relationships as well as genes for antibiotic resistance and virulence factors. Analysis of the pangenome indicated a large variation in the size of accessory genome amongst 22 stains and the size of the accessory genome correlated with number of genomic islands, insertion sequences and prophages. The strains were diverse in terms of sequence type and dissimilar to that of global epidemic P. aeruginosa clones. Of the eye isolates, 62% clustered together within a single lineage. Indian eye isolates possessed genes associated with resistance to aminoglycoside, beta-lactams, sulphonamide, quaternary ammonium compounds, tetracycline, trimethoprims and chloramphenicols. These genes were, however, absent in Australian isolates regardless of source. Overall, our results provide valuable information for understanding the genomic diversity of P. aeruginosa isolated from two different infection types and countries.

@article {pmid30353070,
year = {2018},
author = {Subedi, D and Vijay, AK and Kohli, GS and Rice, SA and Willcox, M},
title = {Comparative genomics of clinical strains of Pseudomonas aeruginosa strains isolated from different geographic sites.},
journal = {Scientific reports},
volume = {8},
number = {1},
pages = {15668},
doi = {10.1038/s41598-018-34020-7},
pmid = {30353070},
issn = {2045-2322},
abstract = {The large and complex genome of Pseudomonas aeruginosa, which consists of significant portions (up to 20%) of transferable genetic elements contributes to the rapid development of antibiotic resistance. The whole genome sequences of 22 strains isolated from eye and cystic fibrosis patients in Australia and India between 1992 and 2007 were used to compare genomic divergence and phylogenetic relationships as well as genes for antibiotic resistance and virulence factors. Analysis of the pangenome indicated a large variation in the size of accessory genome amongst 22 stains and the size of the accessory genome correlated with number of genomic islands, insertion sequences and prophages. The strains were diverse in terms of sequence type and dissimilar to that of global epidemic P. aeruginosa clones. Of the eye isolates, 62% clustered together within a single lineage. Indian eye isolates possessed genes associated with resistance to aminoglycoside, beta-lactams, sulphonamide, quaternary ammonium compounds, tetracycline, trimethoprims and chloramphenicols. These genes were, however, absent in Australian isolates regardless of source. Overall, our results provide valuable information for understanding the genomic diversity of P. aeruginosa isolated from two different infection types and countries.},
}

RevDate: 2018-11-14

Chaudhari NM, Gautam A, Gupta VK, et al (2018)

PanGFR-HM: A Dynamic Web Resource for Pan-Genomic and Functional Profiling of Human Microbiome With Comparative Features.

Frontiers in microbiology, 9:2322.

The conglomerate of microorganisms inhabiting various body-sites of human, known as the human microbiome, is one of the key determinants of human health and disease. Comprehensive pan-genomic and functional analysis approach for human microbiome components can enrich our understanding about impact of microbiome on human health. By utilizing this approach we developed PanGFR-HM (http://www.bioinfo.iicb.res.in/pangfr-hm/) - a novel dynamic web-resource that integrates genomic and functional characteristics of 1293 complete microbial genomes available from Human Microbiome Project. The resource allows users to explore genomic/functional diversity and genome-based phylogenetic relationships between human associated microbial genomes, not provided by any other resource. The key features implemented here include pan-genome and functional analysis of organisms based on taxonomy or body-site, and comparative analysis between groups of organisms. The first feature can also identify probable gene-loss events and significantly over/under represented KEGG/COG categories within pan-genome. The unique second feature can perform comparative genomic, functional and pathways analysis between 4 groups of microbes. The dynamic nature of this resource enables users to define parameters for orthologous clustering and to select any set of organisms for analysis. As an application for comparative feature of PanGFR-HM, we performed a comparative analysis with 67 Lactobacillus genomes isolated from human gut, oral cavity and urogenital tract, and therefore characterized the body-site specific genes, enzymes and pathways. Altogether, PanGFR-HM, being unique in its content and functionality, is expected to provide a platform for microbiome-based comparative functional and evolutionary genomics.

@article {pmid30349509,
year = {2018},
author = {Chaudhari, NM and Gautam, A and Gupta, VK and Kaur, G and Dutta, C and Paul, S},
title = {PanGFR-HM: A Dynamic Web Resource for Pan-Genomic and Functional Profiling of Human Microbiome With Comparative Features.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {2322},
doi = {10.3389/fmicb.2018.02322},
pmid = {30349509},
issn = {1664-302X},
abstract = {The conglomerate of microorganisms inhabiting various body-sites of human, known as the human microbiome, is one of the key determinants of human health and disease. Comprehensive pan-genomic and functional analysis approach for human microbiome components can enrich our understanding about impact of microbiome on human health. By utilizing this approach we developed PanGFR-HM (http://www.bioinfo.iicb.res.in/pangfr-hm/) - a novel dynamic web-resource that integrates genomic and functional characteristics of 1293 complete microbial genomes available from Human Microbiome Project. The resource allows users to explore genomic/functional diversity and genome-based phylogenetic relationships between human associated microbial genomes, not provided by any other resource. The key features implemented here include pan-genome and functional analysis of organisms based on taxonomy or body-site, and comparative analysis between groups of organisms. The first feature can also identify probable gene-loss events and significantly over/under represented KEGG/COG categories within pan-genome. The unique second feature can perform comparative genomic, functional and pathways analysis between 4 groups of microbes. The dynamic nature of this resource enables users to define parameters for orthologous clustering and to select any set of organisms for analysis. As an application for comparative feature of PanGFR-HM, we performed a comparative analysis with 67 Lactobacillus genomes isolated from human gut, oral cavity and urogenital tract, and therefore characterized the body-site specific genes, enzymes and pathways. Altogether, PanGFR-HM, being unique in its content and functionality, is expected to provide a platform for microbiome-based comparative functional and evolutionary genomics.},
}

The fluoroquinolone-resistant ST1193 clonal group of Escherichia coli, from the ST14 clonal complex (STc14) within phylogenetic group B2, has appeared recently as an important cause of extraintestinal disease in humans. Although this emerging lineage has been characterized to some extent using conventional methods, it has not been studied extensively at the genomic level. Here, we used whole genome sequence analysis to compare 355 ST1193 isolates with 72 isolates from other STs within STc14. Using core genome phylogeny, the ST1193 isolates formed a tightly clustered clade with many genotypic similarities, as compared to ST14 isolates. All ST1193 isolates possessed the same set of three chromosomal mutations conferring fluoroquinolone resistance, carried the fimH64 allele, and were lactose non-fermenting. Analysis revealed an evolutionary progression from K1 to K5 capsular types and acquisition of an F-type virulence plasmid followed by changes in plasmid structure congruent with genome phylogeny. In contrast, the numerous identified antimicrobial resistance genes were distributed incongruently with the underlying phylogeny, suggesting frequent gain or loss of the corresponding resistance gene cassettes despite retention of the presumed carrier plasmids. Pangenome analysis revealed gains and losses of genetic loci occurring during the transition from ST14 to ST1193, and from the K1 to K5 capsular types. Using time-scaled phylogenetic analysis, we estimated that current ST1193 clades first emerged approximately 25 years ago. Overall, ST1193 appears to be a recently emerged clone in which both stepwise and mosaic evolution likely have contributed to epidemiologic success.

@article {pmid30348668,
year = {2018},
author = {Johnson, TJ and Elnekave, E and Miller, EA and Munoz-Aguayo, J and Flores Figueroa, C and Johnston, B and Nielson, DW and Logue, CM and Johnson, JR},
title = {Phylogenomic analysis of extraintestinal pathogenic Escherichia coli ST1193, an emerging multidrug-resistant clonal group.},
journal = {Antimicrobial agents and chemotherapy},
volume = {},
number = {},
pages = {},
doi = {10.1128/AAC.01913-18},
pmid = {30348668},
issn = {1098-6596},
abstract = {The fluoroquinolone-resistant ST1193 clonal group of Escherichia coli, from the ST14 clonal complex (STc14) within phylogenetic group B2, has appeared recently as an important cause of extraintestinal disease in humans. Although this emerging lineage has been characterized to some extent using conventional methods, it has not been studied extensively at the genomic level. Here, we used whole genome sequence analysis to compare 355 ST1193 isolates with 72 isolates from other STs within STc14. Using core genome phylogeny, the ST1193 isolates formed a tightly clustered clade with many genotypic similarities, as compared to ST14 isolates. All ST1193 isolates possessed the same set of three chromosomal mutations conferring fluoroquinolone resistance, carried the fimH64 allele, and were lactose non-fermenting. Analysis revealed an evolutionary progression from K1 to K5 capsular types and acquisition of an F-type virulence plasmid followed by changes in plasmid structure congruent with genome phylogeny. In contrast, the numerous identified antimicrobial resistance genes were distributed incongruently with the underlying phylogeny, suggesting frequent gain or loss of the corresponding resistance gene cassettes despite retention of the presumed carrier plasmids. Pangenome analysis revealed gains and losses of genetic loci occurring during the transition from ST14 to ST1193, and from the K1 to K5 capsular types. Using time-scaled phylogenetic analysis, we estimated that current ST1193 clades first emerged approximately 25 years ago. Overall, ST1193 appears to be a recently emerged clone in which both stepwise and mosaic evolution likely have contributed to epidemiologic success.},
}

KEY MESSAGE: The large and complex genomes of many cereals hindered cloning efforts in the past. Advances in genomics now allow the rapid cloning of genes from humanity's most valuable crops. The past two decades were characterized by a genomics revolution that entailed profound changes to crop research, plant breeding, and agriculture. Today, high-quality reference sequences are available for all major cereal crop species. Large resequencing and pan-genome projects start to reveal a more comprehensive picture of the genetic makeup and the diversity among domesticated cereals and their wild relatives. These technological advancements will have a dramatic effect on dissecting genotype-phenotype associations and on gene cloning. In this review, we will highlight the status of the genomic resources available for various cereal crops and we will discuss their implications for gene cloning. A particular focus will be given to the cereal species barley and wheat, which are characterized by very large and complex genomes that have been inaccessible to rapid gene cloning until recently. With the advancements in genomics and the development of several rapid gene-cloning methods, it has now become feasible to tackle the cloning of most agriculturally important genes, even in wheat and barley.

@article {pmid30341495,
year = {2018},
author = {Bettgenhaeuser, J and Krattinger, SG},
title = {Rapid gene cloning in cereals.},
journal = {TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik},
volume = {},
number = {},
pages = {},
doi = {10.1007/s00122-018-3210-7},
pmid = {30341495},
issn = {1432-2242},
abstract = {KEY MESSAGE: The large and complex genomes of many cereals hindered cloning efforts in the past. Advances in genomics now allow the rapid cloning of genes from humanity's most valuable crops. The past two decades were characterized by a genomics revolution that entailed profound changes to crop research, plant breeding, and agriculture. Today, high-quality reference sequences are available for all major cereal crop species. Large resequencing and pan-genome projects start to reveal a more comprehensive picture of the genetic makeup and the diversity among domesticated cereals and their wild relatives. These technological advancements will have a dramatic effect on dissecting genotype-phenotype associations and on gene cloning. In this review, we will highlight the status of the genomic resources available for various cereal crops and we will discuss their implications for gene cloning. A particular focus will be given to the cereal species barley and wheat, which are characterized by very large and complex genomes that have been inaccessible to rapid gene cloning until recently. With the advancements in genomics and the development of several rapid gene-cloning methods, it has now become feasible to tackle the cloning of most agriculturally important genes, even in wheat and barley.},
}

Lactobacillus (L.) brevis represents a versatile, ubiquitistic species of lactic acid bacteria, occurring in various foods, as well as plants and intestinal tracts. The ability to deal with considerably differing environmental conditions in the respective ecological niches implies a genomic adaptation to the particular requirements to use it as a habitat beyond a transient state. Given the isolation source, 24 L. brevis genomes were analyzed via comparative genomics to get a broad view of the genomic complexity and ecological versatility of this species. This analysis showed L. brevis being a genetically diverse species possessing a remarkably large pan genome. As anticipated, it proved difficult to draw a correlation between chromosomal settings and isolation source. However, on plasmidome level, brewery- and insect-derived strains grouped into distinct clusters, referable to a noteworthy gene sharing between both groups. The brewery-specific plasmidome is characterized by several genes, which support a life in the harsh environment beer, but 40% of the brewery plasmidome were found in insect-derived strains as well. This suggests a close interaction between these habitats. Further analysis revealed the presence of a truncated horC cluster version in brewery- and insect-associated strains. This disproves horC, the major contributor to survival in beer, as brewery isolate specific. We conclude that L. brevis does not perform rigorous chromosomal changes to live in different habitats. Rather it appears that the species retains a certain genetic diversity in the plasmidome and meets the requirements of a particular ecological niche with the acquisition of appropriate plasmids.

@article {pmid30341451,
year = {2018},
author = {Fraunhofer, ME and Geißler, AJ and Behr, J and Vogel, RF},
title = {Comparative Genomics of Lactobacillus brevis Reveals a Significant Plasmidome Overlap of Brewery and Insect Isolates.},
journal = {Current microbiology},
volume = {},
number = {},
pages = {},
doi = {10.1007/s00284-018-1581-2},
pmid = {30341451},
issn = {1432-0991},
support = {AiF 18194 N//German Ministry of Economics/ ; },
abstract = {Lactobacillus (L.) brevis represents a versatile, ubiquitistic species of lactic acid bacteria, occurring in various foods, as well as plants and intestinal tracts. The ability to deal with considerably differing environmental conditions in the respective ecological niches implies a genomic adaptation to the particular requirements to use it as a habitat beyond a transient state. Given the isolation source, 24 L. brevis genomes were analyzed via comparative genomics to get a broad view of the genomic complexity and ecological versatility of this species. This analysis showed L. brevis being a genetically diverse species possessing a remarkably large pan genome. As anticipated, it proved difficult to draw a correlation between chromosomal settings and isolation source. However, on plasmidome level, brewery- and insect-derived strains grouped into distinct clusters, referable to a noteworthy gene sharing between both groups. The brewery-specific plasmidome is characterized by several genes, which support a life in the harsh environment beer, but 40% of the brewery plasmidome were found in insect-derived strains as well. This suggests a close interaction between these habitats. Further analysis revealed the presence of a truncated horC cluster version in brewery- and insect-associated strains. This disproves horC, the major contributor to survival in beer, as brewery isolate specific. We conclude that L. brevis does not perform rigorous chromosomal changes to live in different habitats. Rather it appears that the species retains a certain genetic diversity in the plasmidome and meets the requirements of a particular ecological niche with the acquisition of appropriate plasmids.},
}

Legionella spp. are the cause of a severe bacterial pneumonia known as Legionnaires' disease (LD). In some cases, current genetic subtyping methods cannot resolve LD outbreaks caused by common, potentially endemic L. pneumophila (Lp) sequence types (ST), which complicates laboratory investigations and environmental source attribution. In the United States (US), ST1 is the most prevalent clinical and environmental Lp sequence type. In order to characterize the ST1 population, we sequenced 289 outbreak and non-outbreak associated clinical and environmental ST1 and ST1-variant Lp strains from the US and, together with international isolate sequences, explored their genetic and geographic diversity. The ST1 population was highly conserved at the nucleotide level; 98% of core nucleotide positions were invariant and environmental isolates unassociated with human disease (n = 99) contained ~65% more nucleotide diversity compared to clinical-sporadic (n = 139) or outbreak-associated (n = 28) ST1 subgroups. The accessory pangenome of environmental isolates was also ~30-60% larger than other subgroups and was enriched for transposition and conjugative transfer-associated elements. Up to ~10% of US ST1 genetic variation could be explained by geographic origin, but considerable genetic conservation existed among strains isolated from geographically distant states and from different decades. These findings provide new insight into the ST1 population structure and establish a foundation for interpreting genetic relationships among ST1 strains; these data may also inform future analyses for improved outbreak investigations.

@article {pmid30335848,
year = {2018},
author = {Mercante, JW and Caravas, JA and Ishaq, MK and Kozak-Muiznieks, NA and Raphael, BH and Winchell, JM},
title = {Genomic heterogeneity differentiates clinical and environmental subgroups of Legionella pneumophila sequence type 1.},
journal = {PloS one},
volume = {13},
number = {10},
pages = {e0206110},
doi = {10.1371/journal.pone.0206110},
pmid = {30335848},
issn = {1932-6203},
abstract = {Legionella spp. are the cause of a severe bacterial pneumonia known as Legionnaires' disease (LD). In some cases, current genetic subtyping methods cannot resolve LD outbreaks caused by common, potentially endemic L. pneumophila (Lp) sequence types (ST), which complicates laboratory investigations and environmental source attribution. In the United States (US), ST1 is the most prevalent clinical and environmental Lp sequence type. In order to characterize the ST1 population, we sequenced 289 outbreak and non-outbreak associated clinical and environmental ST1 and ST1-variant Lp strains from the US and, together with international isolate sequences, explored their genetic and geographic diversity. The ST1 population was highly conserved at the nucleotide level; 98% of core nucleotide positions were invariant and environmental isolates unassociated with human disease (n = 99) contained ~65% more nucleotide diversity compared to clinical-sporadic (n = 139) or outbreak-associated (n = 28) ST1 subgroups. The accessory pangenome of environmental isolates was also ~30-60% larger than other subgroups and was enriched for transposition and conjugative transfer-associated elements. Up to ~10% of US ST1 genetic variation could be explained by geographic origin, but considerable genetic conservation existed among strains isolated from geographically distant states and from different decades. These findings provide new insight into the ST1 population structure and establish a foundation for interpreting genetic relationships among ST1 strains; these data may also inform future analyses for improved outbreak investigations.},
}

BACKGROUND: Pectobacterium parmentieri is a newly established species within the plant pathogenic family Pectobacteriaceae. Bacteria belonging to this species are causative agents of diseases in economically important crops (e.g. potato) in a wide range of different environmental conditions, encountered in Europe, North America, Africa, and New Zealand. Severe disease symptoms result from the activity of P. parmentieri virulence factors, such as plant cell wall degrading enzymes. Interestingly, we observe significant phenotypic differences among P. parmentieri isolates regarding virulence factors production and the abilities to macerate plants. To establish the possible genomic basis of these differences, we sequenced 12 genomes of P. parmentieri strains (10 isolated in Poland, 2 in Belgium) with the combined use of Illumina and PacBio approaches. De novo genome assembly was performed with the use of SPAdes software, while annotation was conducted by NCBI Prokaryotic Genome Annotation Pipeline.

RESULTS: The pan-genome study was performed on 15 genomes (12 de novo assembled and three reference strains: P. parmentieri CFBP 8475T, P. parmentieri SCC3193, P. parmentieri WPP163). The pan-genome includes 3706 core genes, a high number of accessory (1468) genes, and numerous unique (1847) genes. We identified the presence of well-known genes encoding virulence factors in the core genome fraction, but some of them were located in the dispensable genome. A significant fraction of horizontally transferred genes, virulence-related gene duplications, as well as different CRISPR arrays were found, which can explain the observed phenotypic differences. Finally, we found also, for the first time, the presence of a plasmid in one of the tested P. parmentieri strains isolated in Poland.

CONCLUSIONS: We can hypothesize that a large number of the genes in the dispensable genome and significant genomic variation among P. parmentieri strains could be the basis of the potential wide host range and widespread diffusion of P. parmentieri. The obtained data on the structure and gene content of P. parmentieri strains enabled us to speculate on the importance of high genomic plasticity for P. parmentieri adaptation to different environments.

@article {pmid30326842,
year = {2018},
author = {Zoledowska, S and Motyka-Pomagruk, A and Sledz, W and Mengoni, A and Lojkowska, E},
title = {High genomic variability in the plant pathogenic bacterium Pectobacterium parmentieri deciphered from de novo assembled complete genomes.},
journal = {BMC genomics},
volume = {19},
number = {1},
pages = {751},
doi = {10.1186/s12864-018-5140-9},
pmid = {30326842},
issn = {1471-2164},
support = {2014/14/M/NZ8/00501//Narodowe Centrum Nauki/ ; },
abstract = {BACKGROUND: Pectobacterium parmentieri is a newly established species within the plant pathogenic family Pectobacteriaceae. Bacteria belonging to this species are causative agents of diseases in economically important crops (e.g. potato) in a wide range of different environmental conditions, encountered in Europe, North America, Africa, and New Zealand. Severe disease symptoms result from the activity of P. parmentieri virulence factors, such as plant cell wall degrading enzymes. Interestingly, we observe significant phenotypic differences among P. parmentieri isolates regarding virulence factors production and the abilities to macerate plants. To establish the possible genomic basis of these differences, we sequenced 12 genomes of P. parmentieri strains (10 isolated in Poland, 2 in Belgium) with the combined use of Illumina and PacBio approaches. De novo genome assembly was performed with the use of SPAdes software, while annotation was conducted by NCBI Prokaryotic Genome Annotation Pipeline.

RESULTS: The pan-genome study was performed on 15 genomes (12 de novo assembled and three reference strains: P. parmentieri CFBP 8475T, P. parmentieri SCC3193, P. parmentieri WPP163). The pan-genome includes 3706 core genes, a high number of accessory (1468) genes, and numerous unique (1847) genes. We identified the presence of well-known genes encoding virulence factors in the core genome fraction, but some of them were located in the dispensable genome. A significant fraction of horizontally transferred genes, virulence-related gene duplications, as well as different CRISPR arrays were found, which can explain the observed phenotypic differences. Finally, we found also, for the first time, the presence of a plasmid in one of the tested P. parmentieri strains isolated in Poland.

CONCLUSIONS: We can hypothesize that a large number of the genes in the dispensable genome and significant genomic variation among P. parmentieri strains could be the basis of the potential wide host range and widespread diffusion of P. parmentieri. The obtained data on the structure and gene content of P. parmentieri strains enabled us to speculate on the importance of high genomic plasticity for P. parmentieri adaptation to different environments.},
}

RevDate: 2018-10-13

Yu J, Golicz AA, Lu K, et al (2018)

Insight into the evolution and functional characteristics of the pan-genome assembly from sesame landraces and modern cultivars.

Plant biotechnology journal [Epub ahead of print].

Sesame (Sesamum indicum L.) is an important oil crop renowned for its high oil content and quality. Recently, genome assemblies for five sesame varieties including two landraces (S. indicum cv. Baizhima and Mishuozhima) and three modern cultivars (S. indicum var. Zhongzhi13, Yuzhi11 and Swetha), have become available providing a rich resource for comparative genomic analyses and gene discovery. Here, we employed a reference-assisted assembly approach to improve the draft assemblies of four of the sesame varieties. We then constructed a sesame pan-genome of 554.05 Mb. The pan-genome contained 26,472 orthologous gene clusters; 15,409 (58.21%) of them were core (present across all five sesame genomes), whereas the remaining 41.79% (11,063) clusters and the 15,890 variety-specific genes were dispensable. Comparisons between varieties suggest that modern cultivars from China and India display significant genomic variation. The gene families unique to the sesame modern cultivars contain genes mainly related to yield and quality, while those unique to the landraces contain genes involved in environmental adaptation. Comparative evolutionary analysis indicates that several genes involved in plant-pathogen interaction and lipid metabolism are under positive selection, which may be associated with sesame environmental adaption and selection for high seed oil content. This study of the sesame pan-genome provides insights into the evolution and genomic characteristics of this important oilseed and constitutes a resource for further sesame crop improvement. This article is protected by copyright. All rights reserved.

@article {pmid30315621,
year = {2018},
author = {Yu, J and Golicz, AA and Lu, K and Dossa, K and Zhang, Y and Chen, J and Wang, L and You, J and Fan, D and Edwards, D and Zhang, X},
title = {Insight into the evolution and functional characteristics of the pan-genome assembly from sesame landraces and modern cultivars.},
journal = {Plant biotechnology journal},
volume = {},
number = {},
pages = {},
doi = {10.1111/pbi.13022},
pmid = {30315621},
issn = {1467-7652},
abstract = {Sesame (Sesamum indicum L.) is an important oil crop renowned for its high oil content and quality. Recently, genome assemblies for five sesame varieties including two landraces (S. indicum cv. Baizhima and Mishuozhima) and three modern cultivars (S. indicum var. Zhongzhi13, Yuzhi11 and Swetha), have become available providing a rich resource for comparative genomic analyses and gene discovery. Here, we employed a reference-assisted assembly approach to improve the draft assemblies of four of the sesame varieties. We then constructed a sesame pan-genome of 554.05 Mb. The pan-genome contained 26,472 orthologous gene clusters; 15,409 (58.21%) of them were core (present across all five sesame genomes), whereas the remaining 41.79% (11,063) clusters and the 15,890 variety-specific genes were dispensable. Comparisons between varieties suggest that modern cultivars from China and India display significant genomic variation. The gene families unique to the sesame modern cultivars contain genes mainly related to yield and quality, while those unique to the landraces contain genes involved in environmental adaptation. Comparative evolutionary analysis indicates that several genes involved in plant-pathogen interaction and lipid metabolism are under positive selection, which may be associated with sesame environmental adaption and selection for high seed oil content. This study of the sesame pan-genome provides insights into the evolution and genomic characteristics of this important oilseed and constitutes a resource for further sesame crop improvement. This article is protected by copyright. All rights reserved.},
}

RevDate: 2018-11-14

Bobay LM, H Ochman (2018)

Factors driving effective population size and pan-genome evolution in bacteria.

BMC evolutionary biology, 18(1):153 pii:10.1186/s12862-018-1272-4.

BACKGROUND: Knowledge of population-level processes is essential to understanding the efficacy of selection operating within a species. However, attempts at estimating effective population sizes (Ne) are particularly challenging in bacteria due to their extremely large census populations sizes, varying rates of recombination and arbitrary species boundaries.

RESULTS: In this study, we estimated Ne for 153 species (152 bacteria and one archaeon) defined under a common framework and found that ecological lifestyle and growth rate were major predictors of Ne; and that contrary to theoretical expectations, Ne was unaffected by recombination rate. Additionally, we found that Ne shapes the evolution and diversity of total gene repertoires of prokaryotic species.

CONCLUSION: Together, these results point to a new model of genome architecture evolution in prokaryotes, in which pan-genome sizes, not individual genome sizes, are governed by drift-barrier evolution.

RESULTS: In this study, we estimated Ne for 153 species (152 bacteria and one archaeon) defined under a common framework and found that ecological lifestyle and growth rate were major predictors of Ne; and that contrary to theoretical expectations, Ne was unaffected by recombination rate. Additionally, we found that Ne shapes the evolution and diversity of total gene repertoires of prokaryotic species.

CONCLUSION: Together, these results point to a new model of genome architecture evolution in prokaryotes, in which pan-genome sizes, not individual genome sizes, are governed by drift-barrier evolution.},
}

RevDate: 2018-10-09

Chun BH, Kim KH, Jeong SE, et al (2019)

Genomic and metabolic features of the Bacillus amyloliquefaciens group- B. amyloliquefaciens, B. velezensis, and B. siamensis- revealed by pan-genome analysis.

Food microbiology, 77:146-157.

The genomic and metabolic features of the Bacillus amyloliquefaciens group comprising B. amyloliquefaciens, B. velezensis, and B. siamensis were investigated through a pan-genome analysis combined with an experimental verification of some of the functions identified. All B. amyloliquefaciens group genomes were retrieved from GenBank and their phylogenetic relatedness was subsequently investigated. Genome comparisons of B. amyloliquefaciens, B. siamensis, and B. velezensis showed that their genomic and metabolic features were similar; however species-specific features were also identified. Energy metabolism-related genes are more enriched in B. amyloliquefaciens, whereas secondary metabolite biosynthesis-related genes are enriched in B. velezensis. Compared to B. amyloliquefaciens and B. siamensis, B. velezensis harbors more genes in its core-genome which are involved in the biosynthesis of antimicrobial compounds, as well as genes involved in d-galacturonate and d-fructuronate metabolism. B. amyloliquefaciens, B. siamensis, and B. velezensis all harbor a xanthine oxidase gene cluster (xoABCDE) in their core-genomes that is involved in metabolizing xanthine and uric acid to glycine and oxalureate. A reconstruction of B. amyloliquefaciens group metabolic pathways using their individual pan-genomes revealed that the B. amyloliquefaciens group strains have the ability to metabolize diverse carbon sources aerobically, or anaerobically, and can produce various metabolites such as lactate, ethanol, acetate, CO2, xylitol, diacetyl, acetoin, and 2,3-butanediol. This study therefore provides insights into the genomic and metabolic features of the B. amyloliquefaciens group.

@article {pmid30297045,
year = {2019},
author = {Chun, BH and Kim, KH and Jeong, SE and Jeon, CO},
title = {Genomic and metabolic features of the Bacillus amyloliquefaciens group- B. amyloliquefaciens, B. velezensis, and B. siamensis- revealed by pan-genome analysis.},
journal = {Food microbiology},
volume = {77},
number = {},
pages = {146-157},
doi = {10.1016/j.fm.2018.09.001},
pmid = {30297045},
issn = {1095-9998},
abstract = {The genomic and metabolic features of the Bacillus amyloliquefaciens group comprising B. amyloliquefaciens, B. velezensis, and B. siamensis were investigated through a pan-genome analysis combined with an experimental verification of some of the functions identified. All B. amyloliquefaciens group genomes were retrieved from GenBank and their phylogenetic relatedness was subsequently investigated. Genome comparisons of B. amyloliquefaciens, B. siamensis, and B. velezensis showed that their genomic and metabolic features were similar; however species-specific features were also identified. Energy metabolism-related genes are more enriched in B. amyloliquefaciens, whereas secondary metabolite biosynthesis-related genes are enriched in B. velezensis. Compared to B. amyloliquefaciens and B. siamensis, B. velezensis harbors more genes in its core-genome which are involved in the biosynthesis of antimicrobial compounds, as well as genes involved in d-galacturonate and d-fructuronate metabolism. B. amyloliquefaciens, B. siamensis, and B. velezensis all harbor a xanthine oxidase gene cluster (xoABCDE) in their core-genomes that is involved in metabolizing xanthine and uric acid to glycine and oxalureate. A reconstruction of B. amyloliquefaciens group metabolic pathways using their individual pan-genomes revealed that the B. amyloliquefaciens group strains have the ability to metabolize diverse carbon sources aerobically, or anaerobically, and can produce various metabolites such as lactate, ethanol, acetate, CO2, xylitol, diacetyl, acetoin, and 2,3-butanediol. This study therefore provides insights into the genomic and metabolic features of the B. amyloliquefaciens group.},
}

BACKGROUND: The question of whether bacterial species objectively exist has long divided microbiologists. A major source of contention stems from the fact that bacteria regularly engage in horizontal gene transfer (HGT), making it difficult to ascertain relatedness and draw boundaries between taxa. A natural way to define taxa is based on exclusivity of relatedness, which applies when members of a taxon are more closely related to each other than they are to any outsider. It is largely unknown whether exclusive bacterial taxa exist when averaging over the genome or are rare due to rampant hybridization.

RESULTS: Here, we analyze a collection of 701 genomes representing a wide variety of environmental isolates from the family Streptomycetaceae, whose members are competent at HGT. We find that the presence/absence of auxiliary genes in the pan-genome displays a hierarchical (tree-like) structure that correlates significantly with the genealogy of the core-genome. Moreover, we identified the existence of many exclusive taxa, although individual genes often contradict these taxa. These conclusions were supported by repeating the analysis on 1,586 genomes belonging to the genus Bacillus. However, despite confirming the existence of exclusive groups (taxa), we were unable to identify an objective threshold at which to assign the rank of species.

CONCLUSIONS: The existence of bacterial taxa is justified by considering average relatedness across the entire genome, as captured by exclusivity, but is rejected if one requires unanimous agreement of all parts of the genome. We propose using exclusivity to delimit taxa and conventional genome similarity thresholds to assign bacterial taxa to the species rank. This approach recognizes species that are phylogenetically meaningful, while also establishing some degree of comparability across species-ranked taxa in different bacterial clades.

@article {pmid30285620,
year = {2018},
author = {Wright, ES and Baum, DA},
title = {Exclusivity offers a sound yet practical species criterion for bacteria despite abundant gene flow.},
journal = {BMC genomics},
volume = {19},
number = {1},
pages = {724},
doi = {10.1186/s12864-018-5099-6},
pmid = {30285620},
issn = {1471-2164},
abstract = {BACKGROUND: The question of whether bacterial species objectively exist has long divided microbiologists. A major source of contention stems from the fact that bacteria regularly engage in horizontal gene transfer (HGT), making it difficult to ascertain relatedness and draw boundaries between taxa. A natural way to define taxa is based on exclusivity of relatedness, which applies when members of a taxon are more closely related to each other than they are to any outsider. It is largely unknown whether exclusive bacterial taxa exist when averaging over the genome or are rare due to rampant hybridization.

RESULTS: Here, we analyze a collection of 701 genomes representing a wide variety of environmental isolates from the family Streptomycetaceae, whose members are competent at HGT. We find that the presence/absence of auxiliary genes in the pan-genome displays a hierarchical (tree-like) structure that correlates significantly with the genealogy of the core-genome. Moreover, we identified the existence of many exclusive taxa, although individual genes often contradict these taxa. These conclusions were supported by repeating the analysis on 1,586 genomes belonging to the genus Bacillus. However, despite confirming the existence of exclusive groups (taxa), we were unable to identify an objective threshold at which to assign the rank of species.

CONCLUSIONS: The existence of bacterial taxa is justified by considering average relatedness across the entire genome, as captured by exclusivity, but is rejected if one requires unanimous agreement of all parts of the genome. We propose using exclusivity to delimit taxa and conventional genome similarity thresholds to assign bacterial taxa to the species rank. This approach recognizes species that are phylogenetically meaningful, while also establishing some degree of comparability across species-ranked taxa in different bacterial clades.},
}

RevDate: 2018-11-29

Peng Y, Tang S, Wang D, et al (2018)

MetaPGN: a pipeline for construction and graphical visualization of annotated pangenome networks.

GigaScience, 7(11): pii:5114262.

Pangenome analyses facilitate the interpretation of genetic diversity and evolutionary history of a taxon. However, there is an urgent and unmet need to develop new tools for advanced pangenome construction and visualization, especially for metagenomic data. Here, we present an integrated pipeline, named MetaPGN, for construction and graphical visualization of pangenome networks from either microbial genomes or metagenomes. Given either isolated genomes or metagenomic assemblies coupled with a reference genome of the targeted taxon, MetaPGN generates a pangenome in a topological network, consisting of genes (nodes) and gene-gene genomic adjacencies (edges) of which biological information can be easily updated and retrieved. MetaPGN also includes a self-developed Cytoscape plugin for layout of and interaction with the resulting pangenome network, providing an intuitive and interactive interface for full exploration of genetic diversity. We demonstrate the utility of MetaPGN by constructing Escherichia coli pangenome networks from five E. coli pathogenic strains and 760 human gut microbiomes,revealing extensive genetic diversity of E. coli within both isolates and gut microbial populations. With the ability to extract and visualize gene contents and gene-gene physical adjacencies of a specific taxon from large-scale metagenomic data, MetaPGN provides advantages in expanding pangenome analysis to uncultured microbial taxa.

@article {pmid30277499,
year = {2018},
author = {Peng, Y and Tang, S and Wang, D and Zhong, H and Jia, H and Cai, X and Zhang, Z and Xiao, M and Yang, H and Wang, J and Kristiansen, K and Xu, X and Li, J},
title = {MetaPGN: a pipeline for construction and graphical visualization of annotated pangenome networks.},
journal = {GigaScience},
volume = {7},
number = {11},
pages = {},
doi = {10.1093/gigascience/giy121},
pmid = {30277499},
issn = {2047-217X},
abstract = {Pangenome analyses facilitate the interpretation of genetic diversity and evolutionary history of a taxon. However, there is an urgent and unmet need to develop new tools for advanced pangenome construction and visualization, especially for metagenomic data. Here, we present an integrated pipeline, named MetaPGN, for construction and graphical visualization of pangenome networks from either microbial genomes or metagenomes. Given either isolated genomes or metagenomic assemblies coupled with a reference genome of the targeted taxon, MetaPGN generates a pangenome in a topological network, consisting of genes (nodes) and gene-gene genomic adjacencies (edges) of which biological information can be easily updated and retrieved. MetaPGN also includes a self-developed Cytoscape plugin for layout of and interaction with the resulting pangenome network, providing an intuitive and interactive interface for full exploration of genetic diversity. We demonstrate the utility of MetaPGN by constructing Escherichia coli pangenome networks from five E. coli pathogenic strains and 760 human gut microbiomes,revealing extensive genetic diversity of E. coli within both isolates and gut microbial populations. With the ability to extract and visualize gene contents and gene-gene physical adjacencies of a specific taxon from large-scale metagenomic data, MetaPGN provides advantages in expanding pangenome analysis to uncultured microbial taxa.},
}

Members of the genus Bifidobacterium are found in a wide-range of habitats and are used as important probiotics. Thus, exploration of their functional traits at the genus level is of utmost significance. Besides, this genus has been demonstrated to exhibit an open pan-genome based on the limited number of genomes used in earlier studies. However, the number of genomes is a crucial factor for pan-genome calculations. We have analyzed the pan-genome of a comparatively larger dataset of 215 members of the genus Bifidobacterium belonging to different habitats, which revealed an open nature. The pan-genome for the 56 probiotic and human-gut strains of this genus, was also found to be open. The accessory- and unique-components of this pan-genome were found to be under the operation of Darwinian selection pressure. Further, their genome-size variation was predicted to be attributed to the abundance of certain functions carried by genomic islands, which are facilitated by insertion elements and prophages. In silico functional and host-microbe interaction analyses of their core-genome revealed significant genomic factors for niche-specific adaptations and probiotic traits. The core survival traits include stress tolerance, biofilm formation, nutrient transport, and Sec-secretion system, whereas the core probiotic traits are imparted by the factors involved in carbohydrate- and protein-metabolism and host-immunomodulations.

@article {pmid30275399,
year = {2018},
author = {Sharma, V and Mobeen, F and Prakash, T},
title = {Exploration of Survival Traits, Probiotic Determinants, Host Interactions, and Functional Evolution of Bifidobacterial Genomes Using Comparative Genomics.},
journal = {Genes},
volume = {9},
number = {10},
pages = {},
doi = {10.3390/genes9100477},
pmid = {30275399},
issn = {2073-4425},
abstract = {Members of the genus Bifidobacterium are found in a wide-range of habitats and are used as important probiotics. Thus, exploration of their functional traits at the genus level is of utmost significance. Besides, this genus has been demonstrated to exhibit an open pan-genome based on the limited number of genomes used in earlier studies. However, the number of genomes is a crucial factor for pan-genome calculations. We have analyzed the pan-genome of a comparatively larger dataset of 215 members of the genus Bifidobacterium belonging to different habitats, which revealed an open nature. The pan-genome for the 56 probiotic and human-gut strains of this genus, was also found to be open. The accessory- and unique-components of this pan-genome were found to be under the operation of Darwinian selection pressure. Further, their genome-size variation was predicted to be attributed to the abundance of certain functions carried by genomic islands, which are facilitated by insertion elements and prophages. In silico functional and host-microbe interaction analyses of their core-genome revealed significant genomic factors for niche-specific adaptations and probiotic traits. The core survival traits include stress tolerance, biofilm formation, nutrient transport, and Sec-secretion system, whereas the core probiotic traits are imparted by the factors involved in carbohydrate- and protein-metabolism and host-immunomodulations.},
}

BACKGROUND: Aeromonas hydrophila is a potential zoonotic pathogen and primary fish pathogen. With overlapping characteristics, multiple isolates are often mislabelled and misclassified. Moreover, the potential pathogenic factors among the publicly available genomes in A. hydrophila strains of different origins have not yet been investigated.

RESULTS: To identify the valid strains of A. hydrophila and their pathogenic factors, we performed a pan-genomic study. It revealed that there were 13 mislabelled strains and 49 valid strains that were further verified by Average nucleotide identity (ANI), digital DNA-DNA hybridization (dDDH) and in silico multiple locus strain typing (MLST). Multiple numbers of phages were detected among the strains and among them Aeromonas phi 018 was frequently present. The diversity in type III secretion system (T3SS) and conservation of type II and type VI secretion systems (T2SS and T6SS, respectively) among all the strains are important to study for designing future strategies. The most prevalent antibiotic resistances were found to be beta-lactamase, polymyxin and colistin resistances. The comparative analyses of sequence type (ST) 251 and other ST groups revealed that there were higher numbers of virulence factors in ST-251 than in other STs group.

CONCLUSION: Publicly available genomes have 13 mislabelled organisms, and there are only 49 valid A. hydrophila strains. This valid pan-genome identifies multiple prophages that can be further utilized. Different A. hydrophila strains harbour multiple virulence factors and antibiotic resistance genes. Identification of such factors is important for designing future treatment regimes.

RESULTS: To identify the valid strains of A. hydrophila and their pathogenic factors, we performed a pan-genomic study. It revealed that there were 13 mislabelled strains and 49 valid strains that were further verified by Average nucleotide identity (ANI), digital DNA-DNA hybridization (dDDH) and in silico multiple locus strain typing (MLST). Multiple numbers of phages were detected among the strains and among them Aeromonas phi 018 was frequently present. The diversity in type III secretion system (T3SS) and conservation of type II and type VI secretion systems (T2SS and T6SS, respectively) among all the strains are important to study for designing future strategies. The most prevalent antibiotic resistances were found to be beta-lactamase, polymyxin and colistin resistances. The comparative analyses of sequence type (ST) 251 and other ST groups revealed that there were higher numbers of virulence factors in ST-251 than in other STs group.

CONCLUSION: Publicly available genomes have 13 mislabelled organisms, and there are only 49 valid A. hydrophila strains. This valid pan-genome identifies multiple prophages that can be further utilized. Different A. hydrophila strains harbour multiple virulence factors and antibiotic resistance genes. Identification of such factors is important for designing future treatment regimes.},
}

RevDate: 2018-11-14CmpDate: 2018-10-29

Sheikhizadeh Anari S, de Ridder D, Schranz ME, et al (2018)

Efficient inference of homologs in large eukaryotic pan-proteomes.

BMC bioinformatics, 19(1):340 pii:10.1186/s12859-018-2362-4.

BACKGROUND: Identification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics. Extensive public homology databases are of great value for investigating homology but need to be continually updated to incorporate new sequences. As new sequences are rapidly being generated, there is a need for efficient standalone tools to detect homologs in novel data.

RESULTS: To address this, we present a fast method for detecting homology groups across a large number of individuals and/or species. We adopted a k-mer based approach which considerably reduces the number of pairwise protein alignments without sacrificing sensitivity. We demonstrate accuracy, scalability, efficiency and applicability of the presented method for detecting homology in large proteomes of bacteria, fungi, plants and Metazoa.

CONCLUSIONS: We clearly observed the trade-off between recall and precision in our homology inference. Favoring recall or precision strongly depends on the application. The clustering behavior of our program can be optimized for particular applications by altering a few key parameters. The program is available for public use at https://github.com/sheikhizadeh/pantools as an extension to our pan-genomic analysis tool, PanTools.

RESULTS: To address this, we present a fast method for detecting homology groups across a large number of individuals and/or species. We adopted a k-mer based approach which considerably reduces the number of pairwise protein alignments without sacrificing sensitivity. We demonstrate accuracy, scalability, efficiency and applicability of the presented method for detecting homology in large proteomes of bacteria, fungi, plants and Metazoa.

CONCLUSIONS: We clearly observed the trade-off between recall and precision in our homology inference. Favoring recall or precision strongly depends on the application. The clustering behavior of our program can be optimized for particular applications by altering a few key parameters. The program is available for public use at https://github.com/sheikhizadeh/pantools as an extension to our pan-genomic analysis tool, PanTools.},
}

Multi-Year Persistence of Verotoxigenic Escherichia coli (VTEC) in a Closed Canadian Beef Herd: A Cohort Study.

Frontiers in microbiology, 9:2040.

In this study, fecal samples were collected from a closed beef herd in Alberta, Canada from 2012 to 2015. To limit serotype bias, which was observed in enrichment broth cultures, Verotoxigenic Escherichia coli (VTEC) were isolated directly from samples using a hydrophobic grid-membrane filter verotoxin immunoblot assay. Overall VTEC isolation rates were similar for three different cohorts of yearling heifers on both an annual (68.5 to 71.8%) and seasonal basis (67.3 to 76.0%). Across all three cohorts, O139:H19 (37.1% of VTEC-positive samples), O22:H8 (15.8%) and O?(O108):H8 (15.4%) were among the most prevalent serotypes. However, isolation rates for serotypes O139:H19, O130:H38, O6:H34, O91:H21, and O113:H21 differed significantly between cohort-years, as did isolation rates for some serotypes within a single heifer cohort. There was a high level of VTEC serotype diversity with an average of 4.3 serotypes isolated per heifer and 65.8% of the heifers classified as "persistent shedders" of VTEC based on the criteria of >50% of samples positive and ≥4 consecutive samples positive. Only 26.8% (90/336) of the VTEC isolates from yearling heifers belonged to the human disease-associated seropathotypes A (O157:H7), B (O26:H11, O111:NM), and C (O22:H8, O91:H21, O113:H21, O137:H41, O2:H6). Conversely, seropathotypes B (O26:NM, O111:NM) and C (O91:H21, O2:H29) strains were dominant (76.0%, 19/25) among VTEC isolates from month-old calves from this herd. Among VTEC from heifers, carriage rates of vt1, vt2, vt1+vt2, eae, and hlyA were 10.7, 20.8, 68.5, 3.9, and 88.7%, respectively. The adhesin gene saa was present in 82.7% of heifer strains but absent from all of 13 eae+ve strains (from serotypes/intimin types O157:H7/γ1, O26:H11/β1, O111:NM/θ, O84:H2/ζ, and O182:H25/ζ). Phylogenetic relationships inferred from wgMLST and pan genome-derived core SNP analysis showed that strains clustered by phylotype and serotype. Further, VTEC strains of the same serotype usually shared the same suite of antibiotic resistance and virulence genes, suggesting the circulation of dominant clones within this distinct herd. This study provides insight into the diverse and dynamic nature of VTEC populations within groups of cattle and points to a broad spectrum of human health risks associated with these E. coli strains.

@article {pmid30233526,
year = {2018},
author = {Wang, LYR and Jokinen, CC and Laing, CR and Johnson, RP and Ziebell, K and Gannon, VPJ},
title = {Multi-Year Persistence of Verotoxigenic Escherichia coli (VTEC) in a Closed Canadian Beef Herd: A Cohort Study.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {2040},
doi = {10.3389/fmicb.2018.02040},
pmid = {30233526},
issn = {1664-302X},
abstract = {In this study, fecal samples were collected from a closed beef herd in Alberta, Canada from 2012 to 2015. To limit serotype bias, which was observed in enrichment broth cultures, Verotoxigenic Escherichia coli (VTEC) were isolated directly from samples using a hydrophobic grid-membrane filter verotoxin immunoblot assay. Overall VTEC isolation rates were similar for three different cohorts of yearling heifers on both an annual (68.5 to 71.8%) and seasonal basis (67.3 to 76.0%). Across all three cohorts, O139:H19 (37.1% of VTEC-positive samples), O22:H8 (15.8%) and O?(O108):H8 (15.4%) were among the most prevalent serotypes. However, isolation rates for serotypes O139:H19, O130:H38, O6:H34, O91:H21, and O113:H21 differed significantly between cohort-years, as did isolation rates for some serotypes within a single heifer cohort. There was a high level of VTEC serotype diversity with an average of 4.3 serotypes isolated per heifer and 65.8% of the heifers classified as "persistent shedders" of VTEC based on the criteria of >50% of samples positive and ≥4 consecutive samples positive. Only 26.8% (90/336) of the VTEC isolates from yearling heifers belonged to the human disease-associated seropathotypes A (O157:H7), B (O26:H11, O111:NM), and C (O22:H8, O91:H21, O113:H21, O137:H41, O2:H6). Conversely, seropathotypes B (O26:NM, O111:NM) and C (O91:H21, O2:H29) strains were dominant (76.0%, 19/25) among VTEC isolates from month-old calves from this herd. Among VTEC from heifers, carriage rates of vt1, vt2, vt1+vt2, eae, and hlyA were 10.7, 20.8, 68.5, 3.9, and 88.7%, respectively. The adhesin gene saa was present in 82.7% of heifer strains but absent from all of 13 eae+ve strains (from serotypes/intimin types O157:H7/γ1, O26:H11/β1, O111:NM/θ, O84:H2/ζ, and O182:H25/ζ). Phylogenetic relationships inferred from wgMLST and pan genome-derived core SNP analysis showed that strains clustered by phylotype and serotype. Further, VTEC strains of the same serotype usually shared the same suite of antibiotic resistance and virulence genes, suggesting the circulation of dominant clones within this distinct herd. This study provides insight into the diverse and dynamic nature of VTEC populations within groups of cattle and points to a broad spectrum of human health risks associated with these E. coli strains.},
}

RevDate: 2018-11-14

Golanowska M, Potrykus M, Motyka-Pomagruk A, et al (2018)

Comparison of Highly and Weakly Virulent Dickeya solani Strains, With a View on the Pangenome and Panregulon of This Species.

Frontiers in microbiology, 9:1940.

Bacteria belonging to the genera Dickeya and Pectobacterium are responsible for significant economic losses in a wide variety of crops and ornamentals. During last years, increasing losses in potato production have been attributed to the appearance of Dickeya solani. The D. solani strains investigated so far share genetic homogeneity, although different virulence levels were observed among strains of various origins. The purpose of this study was to investigate the genetic traits possibly related to the diverse virulence levels by means of comparative genomics. First, we developed a new genome assembly pipeline which allowed us to complete the D. solani genomes. Four de novo sequenced and ten publicly available genomes were used to identify the structure of the D. solani pangenome, in which 74.8 and 25.2% of genes were grouped into the core and dispensable genome, respectively. For D. solani panregulon analysis, we performed a binding site prediction for four transcription factors, namely CRP, KdgR, PecS and Fur, to detect the regulons of these virulence regulators. Most of the D. solani potential virulence factors were predicted to belong to the accessory regulons of CRP, KdgR, and PecS. Thus, some differences in gene expression could exist between D. solani strains. The comparison between a highly and a low virulent strain, IFB0099 and IFB0223, respectively, disclosed only small differences between their genomes but significant differences in the production of virulence factors like pectinases, cellulases and proteases, and in their mobility. The D. solani strains also diverge in the number and size of prophages present in their genomes. Another relevant difference is the disruption of the adhesin gene fhaB2 in the highly virulent strain. Strain IFB0223, which has a complete adhesin gene, is less mobile and less aggressive than IFB0099. This suggests that in this case, mobility rather than adherence is needed in order to trigger disease symptoms. This study highlights the utility of comparative genomics in predicting D. solani traits involved in the aggressiveness of this emerging plant pathogen.

@article {pmid30233505,
year = {2018},
author = {Golanowska, M and Potrykus, M and Motyka-Pomagruk, A and Kabza, M and Bacci, G and Galardini, M and Bazzicalupo, M and Makalowska, I and Smalla, K and Mengoni, A and Hugouvieux-Cotte-Pattat, N and Lojkowska, E},
title = {Comparison of Highly and Weakly Virulent Dickeya solani Strains, With a View on the Pangenome and Panregulon of This Species.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {1940},
doi = {10.3389/fmicb.2018.01940},
pmid = {30233505},
issn = {1664-302X},
abstract = {Bacteria belonging to the genera Dickeya and Pectobacterium are responsible for significant economic losses in a wide variety of crops and ornamentals. During last years, increasing losses in potato production have been attributed to the appearance of Dickeya solani. The D. solani strains investigated so far share genetic homogeneity, although different virulence levels were observed among strains of various origins. The purpose of this study was to investigate the genetic traits possibly related to the diverse virulence levels by means of comparative genomics. First, we developed a new genome assembly pipeline which allowed us to complete the D. solani genomes. Four de novo sequenced and ten publicly available genomes were used to identify the structure of the D. solani pangenome, in which 74.8 and 25.2% of genes were grouped into the core and dispensable genome, respectively. For D. solani panregulon analysis, we performed a binding site prediction for four transcription factors, namely CRP, KdgR, PecS and Fur, to detect the regulons of these virulence regulators. Most of the D. solani potential virulence factors were predicted to belong to the accessory regulons of CRP, KdgR, and PecS. Thus, some differences in gene expression could exist between D. solani strains. The comparison between a highly and a low virulent strain, IFB0099 and IFB0223, respectively, disclosed only small differences between their genomes but significant differences in the production of virulence factors like pectinases, cellulases and proteases, and in their mobility. The D. solani strains also diverge in the number and size of prophages present in their genomes. Another relevant difference is the disruption of the adhesin gene fhaB2 in the highly virulent strain. Strain IFB0223, which has a complete adhesin gene, is less mobile and less aggressive than IFB0099. This suggests that in this case, mobility rather than adherence is needed in order to trigger disease symptoms. This study highlights the utility of comparative genomics in predicting D. solani traits involved in the aggressiveness of this emerging plant pathogen.},
}

RevDate: 2018-10-16

Bayer PE, Golicz AA, Tirnaz S, et al (2018)

Variation in abundance of predicted resistance genes in the Brassica oleracea pangenome.

Plant biotechnology journal [Epub ahead of print].

Brassica oleracea is an important agricultural species encompassing many vegetable crops including cabbage, cauliflower, broccoli and kale; however, it can be susceptible to a variety of fungal diseases such as clubroot, blackleg, leaf spot and downy mildew. Resistance to these diseases is meditated by specific disease resistance genes analogs (RGAs) which are differently distributed across B. oleracea lines. The sequenced reference cultivar does not contain all B. oleracea genes due to gene presence/absence variation between individuals, which makes it necessary to search for RGA candidates in the B. oleracea pangenome. Here we present a comparative analysis of RGA candidates in the pangenome of B. oleracea. We show that the presence of RGA candidates differs between lines and suggests that in B. oleracea, SNPs and presence/absence variation drive RGA diversity using separate mechanisms. We identified 59 RGA candidates linked to Sclerotinia, clubroot, and Fusarium wilt resistance QTL, and these findings have implications for crop breeding in B. oleracea, which may also be applicable in other crops species.

@article {pmid30230187,
year = {2018},
author = {Bayer, PE and Golicz, AA and Tirnaz, S and Chan, CK and Edwards, D and Batley, J},
title = {Variation in abundance of predicted resistance genes in the Brassica oleracea pangenome.},
journal = {Plant biotechnology journal},
volume = {},
number = {},
pages = {},
doi = {10.1111/pbi.13015},
pmid = {30230187},
issn = {1467-7652},
support = {DP1601004497//Australian Research Council/ ; FT130100604//Australian Research Council/ ; LP130100925//Australian Research Council/ ; LP140100537//Australian Research Council/ ; LP160100030//Australian Research Council/ ; //Government of Western Australia/ ; },
abstract = {Brassica oleracea is an important agricultural species encompassing many vegetable crops including cabbage, cauliflower, broccoli and kale; however, it can be susceptible to a variety of fungal diseases such as clubroot, blackleg, leaf spot and downy mildew. Resistance to these diseases is meditated by specific disease resistance genes analogs (RGAs) which are differently distributed across B. oleracea lines. The sequenced reference cultivar does not contain all B. oleracea genes due to gene presence/absence variation between individuals, which makes it necessary to search for RGA candidates in the B. oleracea pangenome. Here we present a comparative analysis of RGA candidates in the pangenome of B. oleracea. We show that the presence of RGA candidates differs between lines and suggests that in B. oleracea, SNPs and presence/absence variation drive RGA diversity using separate mechanisms. We identified 59 RGA candidates linked to Sclerotinia, clubroot, and Fusarium wilt resistance QTL, and these findings have implications for crop breeding in B. oleracea, which may also be applicable in other crops species.},
}

RevDate: 2018-10-19

Checcucci A, diCenzo GC, Ghini V, et al (2018)

Creation and Characterization of a Genomically Hybrid Strain in the Nitrogen-Fixing Symbiotic Bacterium Sinorhizobium meliloti.

ACS synthetic biology, 7(10):2365-2378.

Many bacteria, often associated with eukaryotic hosts and of relevance for biotechnological applications, harbor a multipartite genome composed of more than one replicon. Biotechnologically relevant phenotypes are often encoded by genes residing on the secondary replicons. A synthetic biology approach to developing enhanced strains for biotechnological purposes could therefore involve merging pieces or entire replicons from multiple strains into a single genome. Here we report the creation of a genomic hybrid strain in a model multipartite genome species, the plant-symbiotic bacterium Sinorhizobium meliloti. We term this strain as cis-hybrid, since it is produced by genomic material coming from the same species' pangenome. In particular, we moved the secondary replicon pSymA (accounting for nearly 20% of total genome content) from a donor S. meliloti strain to an acceptor strain. The cis-hybrid strain was screened for a panel of complex phenotypes (carbon/nitrogen utilization phenotypes, intra- and extracellular metabolomes, symbiosis, and various microbiological tests). Additionally, metabolic network reconstruction and constraint-based modeling were employed for in silico prediction of metabolic flux reorganization. Phenotypes of the cis-hybrid strain were in good agreement with those of both parental strains. Interestingly, the symbiotic phenotype showed a marked cultivar-specific improvement with the cis-hybrid strains compared to both parental strains. These results provide a proof-of-principle for the feasibility of genome-wide replicon-based remodelling of bacterial strains for improved biotechnological applications in precision agriculture.

@article {pmid30223644,
year = {2018},
author = {Checcucci, A and diCenzo, GC and Ghini, V and Bazzicalupo, M and Becker, A and Decorosi, F and Döhlemann, J and Fagorzi, C and Finan, TM and Fondi, M and Luchinat, C and Turano, P and Vignolini, T and Viti, C and Mengoni, A},
title = {Creation and Characterization of a Genomically Hybrid Strain in the Nitrogen-Fixing Symbiotic Bacterium Sinorhizobium meliloti.},
journal = {ACS synthetic biology},
volume = {7},
number = {10},
pages = {2365-2378},
doi = {10.1021/acssynbio.8b00158},
pmid = {30223644},
issn = {2161-5063},
abstract = {Many bacteria, often associated with eukaryotic hosts and of relevance for biotechnological applications, harbor a multipartite genome composed of more than one replicon. Biotechnologically relevant phenotypes are often encoded by genes residing on the secondary replicons. A synthetic biology approach to developing enhanced strains for biotechnological purposes could therefore involve merging pieces or entire replicons from multiple strains into a single genome. Here we report the creation of a genomic hybrid strain in a model multipartite genome species, the plant-symbiotic bacterium Sinorhizobium meliloti. We term this strain as cis-hybrid, since it is produced by genomic material coming from the same species' pangenome. In particular, we moved the secondary replicon pSymA (accounting for nearly 20% of total genome content) from a donor S. meliloti strain to an acceptor strain. The cis-hybrid strain was screened for a panel of complex phenotypes (carbon/nitrogen utilization phenotypes, intra- and extracellular metabolomes, symbiosis, and various microbiological tests). Additionally, metabolic network reconstruction and constraint-based modeling were employed for in silico prediction of metabolic flux reorganization. Phenotypes of the cis-hybrid strain were in good agreement with those of both parental strains. Interestingly, the symbiotic phenotype showed a marked cultivar-specific improvement with the cis-hybrid strains compared to both parental strains. These results provide a proof-of-principle for the feasibility of genome-wide replicon-based remodelling of bacterial strains for improved biotechnological applications in precision agriculture.},
}

Public health laboratories are currently moving to whole-genome sequence (WGS)-based analyses, and require the rapid prediction of standard reference laboratory methods based solely on genomic data. Currently, these predictive genomics tasks rely on workflows that chain together multiple programs for the requisite analyses. While useful, these systems do not store the analyses in a genome-centric way, meaning the same analyses are often re-computed for the same genomes. To solve this problem, we created Spfy, a platform that rapidly performs the common reference laboratory tests, uses a graph database to store and retrieve the results from the computational workflows and links data to individual genomes using standardized ontologies. The Spfy platform facilitates rapid phenotype identification, as well as the efficient storage and downstream comparative analysis of tens of thousands of genome sequences. Though generally applicable to bacterial genome sequences, Spfy currently contains 10 243 Escherichia coli genomes, for which in-silico serotype and Shiga-toxin subtype, as well as the presence of known virulence factors and antimicrobial resistance determinants have been computed. Additionally, the presence/absence of the entire E. coli pan-genome was computed and linked to each genome. Owing to its database of diverse pre-computed results, and the ability to easily incorporate user data, Spfy facilitates hypothesis testing in fields ranging from population genomics to epidemiology, while mitigating the re-computation of analyses. The graph approach of Spfy is flexible, and can accommodate new analysis software modules as they are developed, easily linking new results to those already stored. Spfy provides a database and analyses approach for E. coli that is able to match the rapid accumulation of WGS data in public databases.

@article {pmid30212910,
year = {2018},
author = {Le, KK and Whiteside, MD and Hopkins, JE and Gannon, VPJ and Laing, CR},
title = {Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses.},
journal = {Database : the journal of biological databases and curation},
volume = {2018},
number = {},
pages = {1-10},
doi = {10.1093/database/bay086},
pmid = {30212910},
issn = {1758-0463},
mesh = {Computational Biology ; *Databases as Topic ; Escherichia coli/genetics/pathogenicity/*physiology ; Genome, Bacterial ; Internet ; Phenotype ; *Software ; Virulence Factors/genetics ; },
abstract = {Public health laboratories are currently moving to whole-genome sequence (WGS)-based analyses, and require the rapid prediction of standard reference laboratory methods based solely on genomic data. Currently, these predictive genomics tasks rely on workflows that chain together multiple programs for the requisite analyses. While useful, these systems do not store the analyses in a genome-centric way, meaning the same analyses are often re-computed for the same genomes. To solve this problem, we created Spfy, a platform that rapidly performs the common reference laboratory tests, uses a graph database to store and retrieve the results from the computational workflows and links data to individual genomes using standardized ontologies. The Spfy platform facilitates rapid phenotype identification, as well as the efficient storage and downstream comparative analysis of tens of thousands of genome sequences. Though generally applicable to bacterial genome sequences, Spfy currently contains 10 243 Escherichia coli genomes, for which in-silico serotype and Shiga-toxin subtype, as well as the presence of known virulence factors and antimicrobial resistance determinants have been computed. Additionally, the presence/absence of the entire E. coli pan-genome was computed and linked to each genome. Owing to its database of diverse pre-computed results, and the ability to easily incorporate user data, Spfy facilitates hypothesis testing in fields ranging from population genomics to epidemiology, while mitigating the re-computation of analyses. The graph approach of Spfy is flexible, and can accommodate new analysis software modules as they are developed, easily linking new results to those already stored. Spfy provides a database and analyses approach for E. coli that is able to match the rapid accumulation of WGS data in public databases.},
}

Genomic variations in a reference collection are naturally represented as genome variation graphs. Such graphs encode common subsequences as vertices and the variations are captured using additional vertices and directed edges. The resulting graphs are directed graphs possibly with cycles. Existing algorithms for aligning sequences on such graphs make use of partial order alignment (POA) techniques that work on directed acyclic graphs (DAGs). To achieve this, acyclic extensions of the input graphs are first constructed through expensive loop unrolling steps (DAGification). Furthermore, such graph extensions could have considerable blowup in their size and in the worst case the blow-up factor is proportional to the input sequence length. We provide a novel alignment algorithm V-ALIGN that aligns the input sequence directly on the input graph while avoiding such expensive DAGification steps. V-ALIGN is based on a novel dynamic programming (DP) formulation that allows gapped alignment directly on the input graph. It supports affine and linear gaps. We also propose refinements to V-ALIGN for better performance in practice. With the proposed refinements, the time to fill the DP table has linear dependence on the sizes of the sequence, the graph, and its feedback vertex set. We conducted experiments to compare the proposed algorithm against the existing POA-based techniques. We also performed alignment experiments on the genome variation graphs constructed from the 1000 Genomes data. For aligning short sequences, standard approaches restrict the expensive gapped alignment to small filtered subgraphs having high similarity to the input sequence. In such cases, the performance of V-ALIGN for gapped alignment on the filtered subgraph depends on the subgraph sizes.

@article {pmid30204489,
year = {2018},
author = {Kavya, VNS and Tayal, K and Srinivasan, R and Sivadasan, N},
title = {Sequence Alignment on Directed Graphs.},
journal = {Journal of computational biology : a journal of computational molecular cell biology},
volume = {},
number = {},
pages = {},
doi = {10.1089/cmb.2017.0264},
pmid = {30204489},
issn = {1557-8666},
abstract = {Genomic variations in a reference collection are naturally represented as genome variation graphs. Such graphs encode common subsequences as vertices and the variations are captured using additional vertices and directed edges. The resulting graphs are directed graphs possibly with cycles. Existing algorithms for aligning sequences on such graphs make use of partial order alignment (POA) techniques that work on directed acyclic graphs (DAGs). To achieve this, acyclic extensions of the input graphs are first constructed through expensive loop unrolling steps (DAGification). Furthermore, such graph extensions could have considerable blowup in their size and in the worst case the blow-up factor is proportional to the input sequence length. We provide a novel alignment algorithm V-ALIGN that aligns the input sequence directly on the input graph while avoiding such expensive DAGification steps. V-ALIGN is based on a novel dynamic programming (DP) formulation that allows gapped alignment directly on the input graph. It supports affine and linear gaps. We also propose refinements to V-ALIGN for better performance in practice. With the proposed refinements, the time to fill the DP table has linear dependence on the sizes of the sequence, the graph, and its feedback vertex set. We conducted experiments to compare the proposed algorithm against the existing POA-based techniques. We also performed alignment experiments on the genome variation graphs constructed from the 1000 Genomes data. For aligning short sequences, standard approaches restrict the expensive gapped alignment to small filtered subgraphs having high similarity to the input sequence. In such cases, the performance of V-ALIGN for gapped alignment on the filtered subgraph depends on the subgraph sizes.},
}

RevDate: 2018-11-14

Chen X, Zhang Y, Zhang Z, et al (2018)

PGAweb: A Web Server for Bacterial Pan-Genome Analysis.

Frontiers in microbiology, 9:1910.

An astronomical increase in microbial genome data in recent years has led to strong demand for bioinformatic tools for pan-genome analysis within and across species. Here, we present PGAweb, a user-friendly, web-based tool for bacterial pan-genome analysis, which is composed of two main pan-genome analysis modules, PGAP and PGAP-X. PGAweb provides key interactive and customizable functions that include orthologous clustering, pan-genome profiling, sequence variation and evolution analysis, and functional classification. PGAweb presents features of genomic structural dynamics and sequence diversity with different visualization methods that are helpful for intuitively understanding the dynamics and evolution of bacterial genomes. PGAweb has an intuitive interface with one-click setting of parameters and is freely available at http://PGAweb.vlcc.cn/.

We report a fungal pan-genome study involving Parastagonospora spp., including 21 isolates of the wheat (Triticum aestivum) pathogen Parastagonospora nodorum, 10 of the grass-infecting Parastagonospora avenae, and 2 of a closely related undefined sister species. We observed substantial variation in the distribution of polymorphisms across the pan-genome, including repeat-induced point mutations, diversifying selection and gene gains and losses. We also discovered chromosome-scale inter and intraspecific presence/absence variation of some sequences, suggesting the occurrence of one or more accessory chromosomes or regions that may play a role in host-pathogen interactions. The presence of known pathogenicity effector loci SnToxA, SnTox1, and SnTox3 varied substantially among isolates. Three P. nodorum isolates lacked functional versions for all three loci, whereas three P. avenae isolates carried one or both of the SnTox1 and SnTox3 genes, indicating previously unrecognized potential for discovering additional effectors in the P. nodorum-wheat pathosystem. We utilized the pan-genomic comparative analysis to improve the prediction of pathogenicity effector candidates, recovering the three confirmed effectors among our top-ranked candidates. We propose applying this pan-genomic approach to identify the effector repertoire involved in other host-microbe interactions involving necrotrophic pathogens in the Pezizomycotina.

@article {pmid30184068,
year = {2018},
author = {Syme, RA and Tan, KC and Rybak, K and Friesen, TL and McDonald, BA and Oliver, RP and Hane, JK},
title = {Pan-Parastagonospora Comparative Genome Analysis-Effector Prediction and Genome Evolution.},
journal = {Genome biology and evolution},
volume = {10},
number = {9},
pages = {2443-2457},
doi = {10.1093/gbe/evy192},
pmid = {30184068},
issn = {1759-6653},
abstract = {We report a fungal pan-genome study involving Parastagonospora spp., including 21 isolates of the wheat (Triticum aestivum) pathogen Parastagonospora nodorum, 10 of the grass-infecting Parastagonospora avenae, and 2 of a closely related undefined sister species. We observed substantial variation in the distribution of polymorphisms across the pan-genome, including repeat-induced point mutations, diversifying selection and gene gains and losses. We also discovered chromosome-scale inter and intraspecific presence/absence variation of some sequences, suggesting the occurrence of one or more accessory chromosomes or regions that may play a role in host-pathogen interactions. The presence of known pathogenicity effector loci SnToxA, SnTox1, and SnTox3 varied substantially among isolates. Three P. nodorum isolates lacked functional versions for all three loci, whereas three P. avenae isolates carried one or both of the SnTox1 and SnTox3 genes, indicating previously unrecognized potential for discovering additional effectors in the P. nodorum-wheat pathosystem. We utilized the pan-genomic comparative analysis to improve the prediction of pathogenicity effector candidates, recovering the three confirmed effectors among our top-ranked candidates. We propose applying this pan-genomic approach to identify the effector repertoire involved in other host-microbe interactions involving necrotrophic pathogens in the Pezizomycotina.},
}

RevDate: 2018-11-14

Yang T, Zhong J, Zhang J, et al (2018)

Pan-Genomic Study of Mycobacterium tuberculosis Reflecting the Primary/Secondary Genes, Generality/Individuality, and the Interconversion Through Copy Number Variations.

Frontiers in microbiology, 9:1886.

Tuberculosis (TB) has surpassed HIV as the leading infectious disease killer worldwide since 2014. The main pathogen, Mycobacterium tuberculosis (Mtb), contains ~4,000 genes that account for ~90% of the genome. However, it is still unclear which of these genes are primary/secondary, which are responsible for generality/individuality, and which interconvert during evolution. Here we utilized a pan-genomic analysis of 36 Mtb genomes to address these questions. We identified 3,679 Mtb core (i.e., primary) genes, determining their phenotypic generality (e.g., virulence, slow growth, dormancy). We also observed 1,122 dispensable and 964 strain-specific secondary genes, reflecting partially shared and lineage-/strain-specific individualities. Among which, five L2 lineage-specific genes might be related to the increased virulence of the L2 lineage. Notably, we discovered 28 Mtb "Super Core Genes" (SCGs: more than a copy in at least 90% strains), which might be of increased importance, and reflected the "super phenotype generality." Most SCGs encode PE/PPE, virulence factors, antigens, and transposases, and have been verified as playing crucial roles in Mtb pathogenicity. Further investigation of the 28 SCGs demonstrated the interconversion among SCGs, single-copy core, dispensable, and strain-specific genes through copy number variations (CNVs) during evolution; different mutations on different copies highlight the delicate adaptive-evolution regulation amongst Mtb lineages. This reflects that the importance of genes varied through CNVs, which might be driven by selective pressure from environment/host-adaptation. In addition, compared with Mycobacterium bovis (Mbo), Mtb possesses 48 specific single core genes that partially reflect the differences between Mtb and Mbo individuality.

@article {pmid30177918,
year = {2018},
author = {Yang, T and Zhong, J and Zhang, J and Li, C and Yu, X and Xiao, J and Jia, X and Ding, N and Ma, G and Wang, G and Yue, L and Liang, Q and Sheng, Y and Sun, Y and Huang, H and Chen, F},
title = {Pan-Genomic Study of Mycobacterium tuberculosis Reflecting the Primary/Secondary Genes, Generality/Individuality, and the Interconversion Through Copy Number Variations.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {1886},
doi = {10.3389/fmicb.2018.01886},
pmid = {30177918},
issn = {1664-302X},
abstract = {Tuberculosis (TB) has surpassed HIV as the leading infectious disease killer worldwide since 2014. The main pathogen, Mycobacterium tuberculosis (Mtb), contains ~4,000 genes that account for ~90% of the genome. However, it is still unclear which of these genes are primary/secondary, which are responsible for generality/individuality, and which interconvert during evolution. Here we utilized a pan-genomic analysis of 36 Mtb genomes to address these questions. We identified 3,679 Mtb core (i.e., primary) genes, determining their phenotypic generality (e.g., virulence, slow growth, dormancy). We also observed 1,122 dispensable and 964 strain-specific secondary genes, reflecting partially shared and lineage-/strain-specific individualities. Among which, five L2 lineage-specific genes might be related to the increased virulence of the L2 lineage. Notably, we discovered 28 Mtb "Super Core Genes" (SCGs: more than a copy in at least 90% strains), which might be of increased importance, and reflected the "super phenotype generality." Most SCGs encode PE/PPE, virulence factors, antigens, and transposases, and have been verified as playing crucial roles in Mtb pathogenicity. Further investigation of the 28 SCGs demonstrated the interconversion among SCGs, single-copy core, dispensable, and strain-specific genes through copy number variations (CNVs) during evolution; different mutations on different copies highlight the delicate adaptive-evolution regulation amongst Mtb lineages. This reflects that the importance of genes varied through CNVs, which might be driven by selective pressure from environment/host-adaptation. In addition, compared with Mycobacterium bovis (Mbo), Mtb possesses 48 specific single core genes that partially reflect the differences between Mtb and Mbo individuality.},
}

RevDate: 2018-11-14

Asaf S, Khan AL, Khan MA, et al (2018)

Complete genome sequencing and analysis of endophytic Sphingomonas sp. LK11 and its potential in plant growth.

3 Biotech, 8(9):389.

Our study aimed to elucidate the plant growth-promoting characteristics and the structure and composition of Sphingomonas sp. LK11 genome using the single molecule real-time (SMRT) sequencing technology of Pacific Biosciences. The results revealed that LK11 produces different types of gibberellins (GAs) in pure culture and significantly improves soybean plant growth by influencing endogenous GAs compared with non-inoculated control plants. Detailed genomic analyses revealed that the Sphingomonas sp. LK11 genome consists of a circular chromosome (3.78 Mbp; 66.2% G+C content) and two circular plasmids (122,975 bps and 34,160 bps; 63 and 65% G+C content, respectively). Annotation showed that the LK11 genome consists of 3656 protein-coding genes, 59 tRNAs, and 4 complete rRNA operons. Functional analyses predicted that LK11 encodes genes for phosphate solubilization and nitrate/nitrite ammonification, which are beneficial for promoting plant growth. Genes for production of catalases, superoxide dismutase, and peroxidases that confer resistance to oxidative stress in plants were also identified in LK11. Moreover, genes for trehalose and glycine betaine biosynthesis were also found in LK11 genome. Similarly, Sphingomonas spp. analysis revealed an open pan-genome and a total of 8507 genes were identified in the Sphingomonas spp. pan-genome and about 1356 orthologous genes were found to comprise the core genome. However, the number of genomes analyzed was not enough to describe complete gene sets. Our findings indicated that the genetic makeup of Sphingomonas sp. LK11 can be utilized as an eco-friendly bioresource for cleaning contaminated sites and promoting growth of plants confronted with environmental perturbations.

Summary: The JCVI Pan-Genome Pipeline is a collection of programs to run PanOCT and tools that support and extend the capabilities of PanOCT. PanOCT (Pan-genome Ortholog Clustering Tool) is a tool for pan-genome analysis of closely related prokaryotic species or strains. The JCVI Pan-Genome Pipeline wrapper invokes command-line utilities that prepare input genomes, invoke third-party tools such as NCBI Blast+, run PanOCT, generate a consensus pan-genome, annotate features of the pan-genome, detect sets of genes of interest such as antimicrobial resistance (AMR) genes, and generate figures, tables, and html pages to visualize the results. The pipeline can run in a hierarchical mode, lowering the RAM and compute resources used.

Availability: Source code, demo data, and detailed documentation are freely available at https://github.com/JCVenterInstitute/PanGenomePipeline.

BACKGROUND: Coagulase negative staphylococci (CoNS) are commensal bacteria on human skin. Staphylococcus lugdunensis is a unique CoNS which produces various virulence factors and may, like S. aureus, cause severe infections, particularly in hospital settings. Unlike other staphylococci, it remains highly susceptible to antimicrobials, and genome-based phylogenetic studies have evidenced a highly conserved genome that distinguishes it from all other staphylococci.

RESULTS: We demonstrate that S. lugdunensis possesses a closed pan-genome with a very limited number of new genes, in contrast to other staphylococci that have an open pan-genome. Whole-genome nucleotide and amino acid identity levels are also higher than in other staphylococci. We identified numerous genetic barriers to horizontal gene transfer that might explain this result. The S. lugdunensis genome has multiple operons encoding for restriction-modification, CRISPR/Cas and toxin/antitoxin systems. We also identified a new PIN-like domain-associated protein that might belong to a larger operon, comprising a metalloprotease, that could function as a new toxin/antitoxin or detoxification system.

CONCLUSION: We show that S. lugdunensis has a unique genome profile within staphylococci, with a closed pan-genome and several systems to prevent horizontal gene transfer. Its virulence in clinical settings does not rely on its ability to acquire and exchange antibiotic resistance genes or other virulence factors as shown for other staphylococci.

RESULTS: We demonstrate that S. lugdunensis possesses a closed pan-genome with a very limited number of new genes, in contrast to other staphylococci that have an open pan-genome. Whole-genome nucleotide and amino acid identity levels are also higher than in other staphylococci. We identified numerous genetic barriers to horizontal gene transfer that might explain this result. The S. lugdunensis genome has multiple operons encoding for restriction-modification, CRISPR/Cas and toxin/antitoxin systems. We also identified a new PIN-like domain-associated protein that might belong to a larger operon, comprising a metalloprotease, that could function as a new toxin/antitoxin or detoxification system.

CONCLUSION: We show that S. lugdunensis has a unique genome profile within staphylococci, with a closed pan-genome and several systems to prevent horizontal gene transfer. Its virulence in clinical settings does not rely on its ability to acquire and exchange antibiotic resistance genes or other virulence factors as shown for other staphylococci.},
}

Genomic Characterization and Copy Number Variation of Bacillus anthracis Plasmids pXO1 and pXO2 in a Historical Collection of 412 Strains.

mSystems, 3(4): pii:mSystems00065-18.

Bacillus anthracis plasmids pXO1 and pXO2 carry the main virulence factors responsible for anthrax. However, the extent of copy number variation within the species and how the plasmids are related to pXO1/pXO2-like plasmids in other species of the Bacillus cereus sensu lato group remain unclear. To gain new insights into these issues, we sequenced 412 B. anthracis strains representing the total phylogenetic and ecological diversity of the species. Our results revealed that B. anthracis genomes carried, on average, 3.86 and 2.29 copies of pXO1 and pXO2, respectively, and also revealed a positive linear correlation between the copy numbers of pXO1 and pXO2. No correlation between the plasmid copy number and the phylogenetic relatedness of the strains was observed. However, genomes of strains isolated from animal tissues generally maintained a higher plasmid copy number than genomes of strains from environmental sources (P < 0.05 [Welch two-sample t test]). Comparisons against B. cereus genomes carrying complete or partial pXO1-like and pXO2-like plasmids showed that the plasmid-based phylogeny recapitulated that of the main chromosome, indicating limited plasmid horizontal transfer between or within these species. Comparisons of gene content revealed a closed pXO1 and pXO2 pangenome; e.g., plasmids encode <8 unique genes, on average, and a single large fragment deletion of pXO1 in one B. anthracis strain (2000031682) was detected. Collectively, our results provide a more complete view of the genomic diversity of B. anthracis plasmids, their copy number variation, and the virulence potential of other Bacillus species carrying pXO1/pXO2-like plasmids. IMPORTANCE Bacillus anthracis microorganisms are of historical and epidemiological importance and are among the most homogenous bacterial groups known, even though the B. anthracis genome is rich in mobile elements. Mobile elements can trigger the diversification of lineages; therefore, characterizing the extent of genomic variation in a large collection of strains is critical for a complete understanding of the diversity and evolution of the species. Here, we sequenced a large collection of B. anthracis strains (>400) that were recovered from human, animal, and environmental sources around the world. Our results confirmed the remarkable stability of gene content and synteny of the anthrax plasmids and revealed no signal of plasmid exchange between B. anthracis and pathogenic B. cereus isolates but rather predominantly vertical descent. These findings advance our understanding of the biology and pathogenomic evolution of B. anthracis and its plasmids.

@article {pmid30116789,
year = {2018},
author = {Pena-Gonzalez, A and Rodriguez-R, LM and Marston, CK and Gee, JE and Gulvik, CA and Kolton, CB and Saile, E and Frace, M and Hoffmaster, AR and Konstantinidis, KT},
title = {Genomic Characterization and Copy Number Variation of Bacillus anthracis Plasmids pXO1 and pXO2 in a Historical Collection of 412 Strains.},
journal = {mSystems},
volume = {3},
number = {4},
pages = {},
doi = {10.1128/mSystems.00065-18},
pmid = {30116789},
issn = {2379-5077},
abstract = {Bacillus anthracis plasmids pXO1 and pXO2 carry the main virulence factors responsible for anthrax. However, the extent of copy number variation within the species and how the plasmids are related to pXO1/pXO2-like plasmids in other species of the Bacillus cereus sensu lato group remain unclear. To gain new insights into these issues, we sequenced 412 B. anthracis strains representing the total phylogenetic and ecological diversity of the species. Our results revealed that B. anthracis genomes carried, on average, 3.86 and 2.29 copies of pXO1 and pXO2, respectively, and also revealed a positive linear correlation between the copy numbers of pXO1 and pXO2. No correlation between the plasmid copy number and the phylogenetic relatedness of the strains was observed. However, genomes of strains isolated from animal tissues generally maintained a higher plasmid copy number than genomes of strains from environmental sources (P < 0.05 [Welch two-sample t test]). Comparisons against B. cereus genomes carrying complete or partial pXO1-like and pXO2-like plasmids showed that the plasmid-based phylogeny recapitulated that of the main chromosome, indicating limited plasmid horizontal transfer between or within these species. Comparisons of gene content revealed a closed pXO1 and pXO2 pangenome; e.g., plasmids encode <8 unique genes, on average, and a single large fragment deletion of pXO1 in one B. anthracis strain (2000031682) was detected. Collectively, our results provide a more complete view of the genomic diversity of B. anthracis plasmids, their copy number variation, and the virulence potential of other Bacillus species carrying pXO1/pXO2-like plasmids. IMPORTANCE Bacillus anthracis microorganisms are of historical and epidemiological importance and are among the most homogenous bacterial groups known, even though the B. anthracis genome is rich in mobile elements. Mobile elements can trigger the diversification of lineages; therefore, characterizing the extent of genomic variation in a large collection of strains is critical for a complete understanding of the diversity and evolution of the species. Here, we sequenced a large collection of B. anthracis strains (>400) that were recovered from human, animal, and environmental sources around the world. Our results confirmed the remarkable stability of gene content and synteny of the anthrax plasmids and revealed no signal of plasmid exchange between B. anthracis and pathogenic B. cereus isolates but rather predominantly vertical descent. These findings advance our understanding of the biology and pathogenomic evolution of B. anthracis and its plasmids.},
}

BACKGROUND: Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the dynamics of wheat genomes on a megabase scale.

RESULTS: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes-the old landrace Chinese Spring and the elite Swiss spring wheat line 'CH Campala Lr22a'. Both chromosomes were assembled into megabase-sized scaffolds. There is a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations reveals four large indels of more than 100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the molecular mechanisms that caused these indels. Three of the large indels affect copy number of NLRs, a gene family involved in plant immunity. Analysis of SNP density reveals four haploblocks of 4, 8, 9 and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Gene content across the two chromosomes was highly conserved. Ninety-nine percent of the genic sequences were present in both genotypes and the fraction of unique genes ranged from 0.4 to 0.7%.

CONCLUSIONS: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations and gene content. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.

RESULTS: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes-the old landrace Chinese Spring and the elite Swiss spring wheat line 'CH Campala Lr22a'. Both chromosomes were assembled into megabase-sized scaffolds. There is a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations reveals four large indels of more than 100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the molecular mechanisms that caused these indels. Three of the large indels affect copy number of NLRs, a gene family involved in plant immunity. Analysis of SNP density reveals four haploblocks of 4, 8, 9 and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Gene content across the two chromosomes was highly conserved. Ninety-nine percent of the genic sequences were present in both genotypes and the fraction of unique genes ranged from 0.4 to 0.7%.

CONCLUSIONS: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations and gene content. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.},
}

We consider the problem of identifying regions within a pan-genome De Bruijn graph that are traversed by many sequence paths. We define such regions and the subpaths that traverse them as frequented regions (FRs). In this work, we formalize the FR problem and describe an efficient algorithm for finding FRs. Subsequently, we propose some applications of FRs based on machine-learning and pan-genome graph simplification. We demonstrate the effectiveness of these applications using data sets for the organisms Staphylococcus aureus (bacterium) and Saccharomyces cerevisiae (yeast). We corroborate the biological relevance of FRs such as identifying introgressions in yeast that aid in alcohol tolerance, and show that FRs are useful for classification of yeast strains by industrial use and visualizing pan-genomic space.

@article {pmid30106690,
year = {2018},
author = {Cleary, A and Ramaraj, T and Kahanda, I and Mudge, J and Mumey, B},
title = {Exploring Frequented Regions in Pan-Genomic Graphs.},
journal = {IEEE/ACM transactions on computational biology and bioinformatics},
volume = {},
number = {},
pages = {},
doi = {10.1109/TCBB.2018.2864564},
pmid = {30106690},
issn = {1557-9964},
abstract = {We consider the problem of identifying regions within a pan-genome De Bruijn graph that are traversed by many sequence paths. We define such regions and the subpaths that traverse them as frequented regions (FRs). In this work, we formalize the FR problem and describe an efficient algorithm for finding FRs. Subsequently, we propose some applications of FRs based on machine-learning and pan-genome graph simplification. We demonstrate the effectiveness of these applications using data sets for the organisms Staphylococcus aureus (bacterium) and Saccharomyces cerevisiae (yeast). We corroborate the biological relevance of FRs such as identifying introgressions in yeast that aid in alcohol tolerance, and show that FRs are useful for classification of yeast strains by industrial use and visualizing pan-genomic space.},
}

Mycobacterium marinum is the causative agent for the tuberculosis-like disease mycobacteriosis in fish and skin lesions in humans. Ubiquitous in its geographical distribution, M. marinum is known to occupy diverse fish as hosts. However, information about its genomic diversity is limited. Here, we provide the genome sequences for 15 M. marinum strains isolated from infected humans and fish. Comparative genomic analysis of these and four available genomes of the M. marinum strains M, E11, MB2 and Europe reveal high genomic diversity among the strains, leading to the conclusion that M. marinum should be divided into two different clusters, the "M"- and the "Aronson"-type. We suggest that these two clusters should be considered to represent two M. marinum subspecies. Our data also show that the M. marinum pan-genome for both groups is open and expanding and we provide data showing high number of mutational hotspots in M. marinum relative to other mycobacteria such as Mycobacterium tuberculosis. This high genomic diversity might be related to the ability of M. marinum to occupy different ecological niches.

@article {pmid30104693,
year = {2018},
author = {Das, S and Pettersson, BMF and Behra, PRK and Mallick, A and Cheramie, M and Ramesh, M and Shirreff, L and DuCote, T and Dasgupta, S and Ennis, DG and Kirsebom, LA},
title = {Extensive genomic diversity among Mycobacterium marinum strains revealed by whole genome sequencing.},
journal = {Scientific reports},
volume = {8},
number = {1},
pages = {12040},
doi = {10.1038/s41598-018-30152-y},
pmid = {30104693},
issn = {2045-2322},
support = {RD01-A-38//Louisiana Board of Regents (Board of Regents)/ ; 222-2012-492//Svenska Forskningsr&#x00E5;det Formas (Swedish Research Council Formas)/ ; },
abstract = {Mycobacterium marinum is the causative agent for the tuberculosis-like disease mycobacteriosis in fish and skin lesions in humans. Ubiquitous in its geographical distribution, M. marinum is known to occupy diverse fish as hosts. However, information about its genomic diversity is limited. Here, we provide the genome sequences for 15 M. marinum strains isolated from infected humans and fish. Comparative genomic analysis of these and four available genomes of the M. marinum strains M, E11, MB2 and Europe reveal high genomic diversity among the strains, leading to the conclusion that M. marinum should be divided into two different clusters, the "M"- and the "Aronson"-type. We suggest that these two clusters should be considered to represent two M. marinum subspecies. Our data also show that the M. marinum pan-genome for both groups is open and expanding and we provide data showing high number of mutational hotspots in M. marinum relative to other mycobacteria such as Mycobacterium tuberculosis. This high genomic diversity might be related to the ability of M. marinum to occupy different ecological niches.},
}

RevDate: 2018-11-28

Pluta R, M Espinosa (2018)

Antisense and yet sensitive: Copy number control of rolling circle-replicating plasmids by small RNAs.

Wiley interdisciplinary reviews. RNA, 9(6):e1500.

Bacterial plasmids constitute a wealth of shared DNA amounting to about 20% of the total prokaryotic pangenome. Plasmids replicate autonomously and control their replication by maintaining a fairly constant number of copies within a given host. Plasmids should acquire a good fitness to their hosts so that they do not constitute a genetic load. Here we review some basic concepts in plasmid biology, pertaining to the control of replication and distribution of plasmid copies among daughter cells. A particular class of plasmids is constituted by those that replicate by the rolling circle mode (rolling circle-replicating [RCR]-plasmids). They are small double-stranded DNA molecules, with a rather high number of copies in the original host. RCR-plasmids control their replication by means of a small short-lived antisense RNA, alone or in combination with a plasmid-encoded transcriptional repressor protein. Two plasmid prototypes have been studied in depth, namely the staphylococcal plasmid pT181 and the streptococcal plasmid pMV158, each corresponding to the two types of replication control circuits, respectively. We further discuss possible applications of the plasmid-encoded antisense RNAs and address some future directions that, in our opinion, should be pursued in the study of these small molecules. This article is categorized under: Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs RNA Structure and Dynamics > Influence of RNA Structure in Biological Systems.

@article {pmid30074293,
year = {2018},
author = {Pluta, R and Espinosa, M},
title = {Antisense and yet sensitive: Copy number control of rolling circle-replicating plasmids by small RNAs.},
journal = {Wiley interdisciplinary reviews. RNA},
volume = {9},
number = {6},
pages = {e1500},
doi = {10.1002/wrna.1500},
pmid = {30074293},
issn = {1757-7012},
abstract = {Bacterial plasmids constitute a wealth of shared DNA amounting to about 20% of the total prokaryotic pangenome. Plasmids replicate autonomously and control their replication by maintaining a fairly constant number of copies within a given host. Plasmids should acquire a good fitness to their hosts so that they do not constitute a genetic load. Here we review some basic concepts in plasmid biology, pertaining to the control of replication and distribution of plasmid copies among daughter cells. A particular class of plasmids is constituted by those that replicate by the rolling circle mode (rolling circle-replicating [RCR]-plasmids). They are small double-stranded DNA molecules, with a rather high number of copies in the original host. RCR-plasmids control their replication by means of a small short-lived antisense RNA, alone or in combination with a plasmid-encoded transcriptional repressor protein. Two plasmid prototypes have been studied in depth, namely the staphylococcal plasmid pT181 and the streptococcal plasmid pMV158, each corresponding to the two types of replication control circuits, respectively. We further discuss possible applications of the plasmid-encoded antisense RNAs and address some future directions that, in our opinion, should be pursued in the study of these small molecules. This article is categorized under: Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs RNA Structure and Dynamics > Influence of RNA Structure in Biological Systems.},
}

The halophilic bacterium Salinibacter ruber is an abundant and ecologically important member of halophilic communities worldwide. Given its broad distribution and high intraspecific genetic diversity, S. ruber is considered one of the main models for ecological and evolutionary studies of bacterial adaptation to hypersaline environments. However, current insights on the genomic diversity of this species is limited to the comparison of the genomes of two co-isolated strains. Here, we present a comparative genomic analysis of eight S. ruber strains isolated at two different time points in each of two different Mediterranean solar salterns. Our results show an open pangenome with contrasting evolutionary patterns in the core and accessory genomes. We found that the core genome is shaped by extensive homologous recombination (HR), which results in limited sequence variation within population clusters. In contrast, the accessory genome is modulated by horizontal gene transfer (HGT), with genomic islands and plasmids acting as gateways to the rest of the genome. In addition, both types of genetic exchange are modulated by restriction and modification (RM) or CRISPR-Cas systems. Finally, genes differentially impacted by such processes reveal functional processes potentially relevant for environmental interactions and adaptation to extremophilic conditions. Altogether, our results support scenarios that conciliate "Neutral" and "Constant Diversity" models of bacterial evolution.

@article {pmid30072959,
year = {2018},
author = {González-Torres, P and Gabaldón, T},
title = {Genome Variation in the Model Halophilic Bacterium Salinibacter ruber.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {1499},
doi = {10.3389/fmicb.2018.01499},
pmid = {30072959},
issn = {1664-302X},
abstract = {The halophilic bacterium Salinibacter ruber is an abundant and ecologically important member of halophilic communities worldwide. Given its broad distribution and high intraspecific genetic diversity, S. ruber is considered one of the main models for ecological and evolutionary studies of bacterial adaptation to hypersaline environments. However, current insights on the genomic diversity of this species is limited to the comparison of the genomes of two co-isolated strains. Here, we present a comparative genomic analysis of eight S. ruber strains isolated at two different time points in each of two different Mediterranean solar salterns. Our results show an open pangenome with contrasting evolutionary patterns in the core and accessory genomes. We found that the core genome is shaped by extensive homologous recombination (HR), which results in limited sequence variation within population clusters. In contrast, the accessory genome is modulated by horizontal gene transfer (HGT), with genomic islands and plasmids acting as gateways to the rest of the genome. In addition, both types of genetic exchange are modulated by restriction and modification (RM) or CRISPR-Cas systems. Finally, genes differentially impacted by such processes reveal functional processes potentially relevant for environmental interactions and adaptation to extremophilic conditions. Altogether, our results support scenarios that conciliate "Neutral" and "Constant Diversity" models of bacterial evolution.},
}

RevDate: 2018-08-31

Springer NM, Anderson SN, Andorf CM, et al (2018)

The maize W22 genome provides a foundation for functional genomics and transposon biology.

Nature genetics, 50(9):1282-1288.

The maize W22 inbred has served as a platform for maize genetics since the mid twentieth century. To streamline maize genome analyses, we have sequenced and de novo assembled a W22 reference genome using short-read sequencing technologies. We show that significant structural heterogeneity exists in comparison to the B73 reference genome at multiple scales, from transposon composition and copy number variation to single-nucleotide polymorphisms. The generation of this reference genome enables accurate placement of thousands of Mutator (Mu) and Dissociation (Ds) transposable element insertions for reverse and forward genetics studies. Annotation of the genome has been achieved using RNA-seq analysis, differential nuclease sensitivity profiling and bisulfite sequencing to map open reading frames, open chromatin sites and DNA methylation profiles, respectively. Collectively, the resources developed here integrate W22 as a community reference genome for functional genomics and provide a foundation for the maize pan-genome.

@article {pmid30061736,
year = {2018},
author = {Springer, NM and Anderson, SN and Andorf, CM and Ahern, KR and Bai, F and Barad, O and Barbazuk, WB and Bass, HW and Baruch, K and Ben-Zvi, G and Buckler, ES and Bukowski, R and Campbell, MS and Cannon, EKS and Chomet, P and Dawe, RK and Davenport, R and Dooner, HK and Du, LH and Du, C and Easterling, KA and Gault, C and Guan, JC and Hunter, CT and Jander, G and Jiao, Y and Koch, KE and Kol, G and Köllner, TG and Kudo, T and Li, Q and Lu, F and Mayfield-Jones, D and Mei, W and McCarty, DR and Noshay, JM and Portwood, JL and Ronen, G and Settles, AM and Shem-Tov, D and Shi, J and Soifer, I and Stein, JC and Stitzer, MC and Suzuki, M and Vera, DL and Vollbrecht, E and Vrebalov, JT and Ware, D and Wei, S and Wimalanathan, K and Woodhouse, MR and Xiong, W and Brutnell, TP},
title = {The maize W22 genome provides a foundation for functional genomics and transposon biology.},
journal = {Nature genetics},
volume = {50},
number = {9},
pages = {1282-1288},
doi = {10.1038/s41588-018-0158-0},
pmid = {30061736},
issn = {1546-1718},
abstract = {The maize W22 inbred has served as a platform for maize genetics since the mid twentieth century. To streamline maize genome analyses, we have sequenced and de novo assembled a W22 reference genome using short-read sequencing technologies. We show that significant structural heterogeneity exists in comparison to the B73 reference genome at multiple scales, from transposon composition and copy number variation to single-nucleotide polymorphisms. The generation of this reference genome enables accurate placement of thousands of Mutator (Mu) and Dissociation (Ds) transposable element insertions for reverse and forward genetics studies. Annotation of the genome has been achieved using RNA-seq analysis, differential nuclease sensitivity profiling and bisulfite sequencing to map open reading frames, open chromatin sites and DNA methylation profiles, respectively. Collectively, the resources developed here integrate W22 as a community reference genome for functional genomics and provide a foundation for the maize pan-genome.},
}

BACKGROUND: Streptococcus agalactiae, also known as Group B Streptococcus (GBS), is a Gram-positive bacterium that colonizes the gastrointestinal and genitourinary tract of humans. This bacterium has also been isolated from various animals, such as fish and cattle. Non-coding RNAs (ncRNAs) can act as regulators of gene expression in bacteria, such as Streptococcus pneumoniae and Streptococcus pyogenes. However, little is known about the genomic distribution of ncRNAs and RNA families in S. agalactiae.

RESULTS: Comparative genome analysis of 27 S. agalactiae strains showed more than 5 thousand genomic regions identified and classified as Core, Exclusive, and Shared genome sequences. We identified 27 to 89 RNA families per genome distributed over these regions, from these, 25 were in Core regions while Shared and Exclusive regions showed variations amongst strains. We propose that the amount and type of ncRNA present in each genome can provide a pattern to contribute in the identification of the clonal types.

CONCLUSIONS: The identification of RNA families provides an insight over ncRNAs, sRNAs and ribozymes function, that can be further explored as targets for antibiotic development or studied in gene regulation of cellular processes. RNA families could be considered as markers to determine infection capabilities of different strains. Lastly, pan-genome analysis of GBS including the full range of functional transcripts provides a broader approach in the understanding of this pathogen.

@article {pmid30055586,
year = {2018},
author = {Wolf, IR and Paschoal, AR and Quiroga, C and Domingues, DS and de Souza, RF and Pretto-Giordano, LG and Vilas-Boas, LA},
title = {Functional annotation and distribution overview of RNA families in 27 Streptococcus agalactiae genomes.},
journal = {BMC genomics},
volume = {19},
number = {1},
pages = {556},
doi = {10.1186/s12864-018-4951-z},
pmid = {30055586},
issn = {1471-2164},
abstract = {BACKGROUND: Streptococcus agalactiae, also known as Group B Streptococcus (GBS), is a Gram-positive bacterium that colonizes the gastrointestinal and genitourinary tract of humans. This bacterium has also been isolated from various animals, such as fish and cattle. Non-coding RNAs (ncRNAs) can act as regulators of gene expression in bacteria, such as Streptococcus pneumoniae and Streptococcus pyogenes. However, little is known about the genomic distribution of ncRNAs and RNA families in S. agalactiae.

RESULTS: Comparative genome analysis of 27 S. agalactiae strains showed more than 5 thousand genomic regions identified and classified as Core, Exclusive, and Shared genome sequences. We identified 27 to 89 RNA families per genome distributed over these regions, from these, 25 were in Core regions while Shared and Exclusive regions showed variations amongst strains. We propose that the amount and type of ncRNA present in each genome can provide a pattern to contribute in the identification of the clonal types.

CONCLUSIONS: The identification of RNA families provides an insight over ncRNAs, sRNAs and ribozymes function, that can be further explored as targets for antibiotic development or studied in gene regulation of cellular processes. RNA families could be considered as markers to determine infection capabilities of different strains. Lastly, pan-genome analysis of GBS including the full range of functional transcripts provides a broader approach in the understanding of this pathogen.},
}

Paenibacillus polymyxa (formerly known as Bacillus polymyxa) has been extensively studied for agricultural applications as a plant-growth-promoting rhizobacterium and is also an important biocontrol agent. Our team has developed the P. polymyxa strain HY96-2 from the tomato rhizosphere as the first microbial biopesticide based on P. polymyxa for controlling plant diseases around the world, leading to the commercialization of this microbial biopesticide in China. However, further research is essential for understanding its precise biocontrol mechanisms. In this paper, we report the complete genome sequence of HY96-2 and the results of a comparative genomic analysis between different P. polymyxa strains. The complete genome size of HY96-2 was found to be 5.75 Mb and 5207 coding sequences were predicted. HY96-2 was compared with seven other P. polymyxa strains for which complete genome sequences have been published, using phylogenetic tree, pan-genome, and nucleic acid co-linearity analysis. In addition, the genes and gene clusters involved in biofilm formation, antibiotic synthesis, and systemic resistance inducer production were compared between strain HY96-2 and two other strains, namely, SC2 and E681. The results revealed that all three of the P. polymyxa strains have the ability to control plant diseases via the mechanisms of colonization (biofilm formation), antagonism (antibiotic production), and induced resistance (systemic resistance inducer production). However, the variation of the corresponding genes or gene clusters between the three strains may lead to different antimicrobial spectra and biocontrol efficacies. Two possible pathways of biofilm formation in P. polymyxa were reported for the first time after searching the KEGG database. This study provides a scientific basis for the further optimization of the field applications and quality standards of industrial microbial biopesticides based on HY96-2. It may also serve as a reference for studying the differences in antimicrobial spectra and biocontrol capability between different biocontrol agents.

@article {pmid30050512,
year = {2018},
author = {Luo, Y and Cheng, Y and Yi, J and Zhang, Z and Luo, Q and Zhang, D and Li, Y},
title = {Complete Genome Sequence of Industrial Biocontrol Strain Paenibacillus polymyxa HY96-2 and Further Analysis of Its Biocontrol Mechanism.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {1520},
doi = {10.3389/fmicb.2018.01520},
pmid = {30050512},
issn = {1664-302X},
abstract = {Paenibacillus polymyxa (formerly known as Bacillus polymyxa) has been extensively studied for agricultural applications as a plant-growth-promoting rhizobacterium and is also an important biocontrol agent. Our team has developed the P. polymyxa strain HY96-2 from the tomato rhizosphere as the first microbial biopesticide based on P. polymyxa for controlling plant diseases around the world, leading to the commercialization of this microbial biopesticide in China. However, further research is essential for understanding its precise biocontrol mechanisms. In this paper, we report the complete genome sequence of HY96-2 and the results of a comparative genomic analysis between different P. polymyxa strains. The complete genome size of HY96-2 was found to be 5.75 Mb and 5207 coding sequences were predicted. HY96-2 was compared with seven other P. polymyxa strains for which complete genome sequences have been published, using phylogenetic tree, pan-genome, and nucleic acid co-linearity analysis. In addition, the genes and gene clusters involved in biofilm formation, antibiotic synthesis, and systemic resistance inducer production were compared between strain HY96-2 and two other strains, namely, SC2 and E681. The results revealed that all three of the P. polymyxa strains have the ability to control plant diseases via the mechanisms of colonization (biofilm formation), antagonism (antibiotic production), and induced resistance (systemic resistance inducer production). However, the variation of the corresponding genes or gene clusters between the three strains may lead to different antimicrobial spectra and biocontrol efficacies. Two possible pathways of biofilm formation in P. polymyxa were reported for the first time after searching the KEGG database. This study provides a scientific basis for the further optimization of the field applications and quality standards of industrial microbial biopesticides based on HY96-2. It may also serve as a reference for studying the differences in antimicrobial spectra and biocontrol capability between different biocontrol agents.},
}

RevDate: 2018-11-14

Aherfi S, Andreani J, Baptiste E, et al (2018)

A Large Open Pangenome and a Small Core Genome for Giant Pandoraviruses.

Frontiers in microbiology, 9:1486.

Giant viruses of amoebae are distinct from classical viruses by the giant size of their virions and genomes. Pandoraviruses are the record holders in size of genomes and number of predicted genes. Three strains, P. salinus, P. dulcis, and P. inopinatum, have been described to date. We isolated three new ones, namely P. massiliensis, P. braziliensis, and P. pampulha, from environmental samples collected in Brazil. We describe here their genomes, the transcriptome and proteome of P. massiliensis, and the pangenome of the group encompassing the six pandoravirus isolates. Genome sequencing was performed with an Illumina MiSeq instrument. Genome annotation was performed using GeneMarkS and Prodigal softwares and comparative genomic analyses. The core genome and pangenome were determined using notably ProteinOrtho and CD-HIT programs. Transcriptomics was performed for P. massiliensis with the Illumina MiSeq instrument; proteomics was also performed for this virus using 1D/2D gel electrophoresis and mass spectrometry on a Synapt G2Si Q-TOF traveling wave mobility spectrometer. The genomes of the three new pandoraviruses are comprised between 1.6 and 1.8 Mbp. The genomes of P. massiliensis, P. pampulha, and P. braziliensis were predicted to harbor 1,414, 2,368, and 2,696 genes, respectively. These genes comprise up to 67% of ORFans. Phylogenomic analyses showed that P. massiliensis and P. braziliensis were more closely related to each other than to the other pandoraviruses. The core genome of pandoraviruses comprises 352 clusters of genes, and the ratio core genome/pangenome is less than 0.05. The extinction curve shows clearly that the pangenome is still open. A quarter of the gene content of P. massiliensis was detected by transcriptomics. In addition, a product for a total of 162 open reading frames were found by proteomic analysis of P. massiliensis virions, including notably the products of 28 ORFans, 99 hypothetical proteins, and 90 core genes. Further analyses should allow to gain a better knowledge and understanding of the evolution and origin of these giant pandoraviruses, and of their relationships with viruses and cellular microorganisms.

@article {pmid30042742,
year = {2018},
author = {Aherfi, S and Andreani, J and Baptiste, E and Oumessoum, A and Dornas, FP and Andrade, ACDSP and Chabriere, E and Abrahao, J and Levasseur, A and Raoult, D and La Scola, B and Colson, P},
title = {A Large Open Pangenome and a Small Core Genome for Giant Pandoraviruses.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {1486},
doi = {10.3389/fmicb.2018.01486},
pmid = {30042742},
issn = {1664-302X},
abstract = {Giant viruses of amoebae are distinct from classical viruses by the giant size of their virions and genomes. Pandoraviruses are the record holders in size of genomes and number of predicted genes. Three strains, P. salinus, P. dulcis, and P. inopinatum, have been described to date. We isolated three new ones, namely P. massiliensis, P. braziliensis, and P. pampulha, from environmental samples collected in Brazil. We describe here their genomes, the transcriptome and proteome of P. massiliensis, and the pangenome of the group encompassing the six pandoravirus isolates. Genome sequencing was performed with an Illumina MiSeq instrument. Genome annotation was performed using GeneMarkS and Prodigal softwares and comparative genomic analyses. The core genome and pangenome were determined using notably ProteinOrtho and CD-HIT programs. Transcriptomics was performed for P. massiliensis with the Illumina MiSeq instrument; proteomics was also performed for this virus using 1D/2D gel electrophoresis and mass spectrometry on a Synapt G2Si Q-TOF traveling wave mobility spectrometer. The genomes of the three new pandoraviruses are comprised between 1.6 and 1.8 Mbp. The genomes of P. massiliensis, P. pampulha, and P. braziliensis were predicted to harbor 1,414, 2,368, and 2,696 genes, respectively. These genes comprise up to 67% of ORFans. Phylogenomic analyses showed that P. massiliensis and P. braziliensis were more closely related to each other than to the other pandoraviruses. The core genome of pandoraviruses comprises 352 clusters of genes, and the ratio core genome/pangenome is less than 0.05. The extinction curve shows clearly that the pangenome is still open. A quarter of the gene content of P. massiliensis was detected by transcriptomics. In addition, a product for a total of 162 open reading frames were found by proteomic analysis of P. massiliensis virions, including notably the products of 28 ORFans, 99 hypothetical proteins, and 90 core genes. Further analyses should allow to gain a better knowledge and understanding of the evolution and origin of these giant pandoraviruses, and of their relationships with viruses and cellular microorganisms.},
}

Orientia tsutsugamushi, formerly Rickettsia tsutsugamushi, is an obligate intracellular pathogen that causes scrub typhus, an underdiagnosed acute febrile disease with high morbidity. Scrub typhus is transmitted by the larval stage (chigger) of Leptotrombidium mites and is irregularly distributed across endemic regions of Asia, Australia and islands of the western Pacific Ocean. Previous work to understand population genetics in O. tsutsugamushi has been based on sub-genomic sampling methods and whole-genome characterization of two genomes. In this study, we compared 40 genomes from geographically dispersed areas and confirmed patterns of extensive homologous recombination likely driven by transposons, conjugative elements and repetitive sequences. High rates of lateral gene transfer (LGT) among O. tsutsugamushi genomes appear to have effectively eliminated a detectable clonal frame, but not our ability to infer evolutionary relationships and phylogeographical clustering. Pan-genomic comparisons using 31 082 high-quality bacterial genomes from 253 species suggests that genomic duplication in O. tsutsugamushi is almost unparalleled. Unlike other highly recombinant species where the uptake of exogenous DNA largely drives genomic diversity, the pan-genome of O. tsutsugamushi is driven by duplication and divergence. Extensive gene innovation by duplication is most commonly attributed to plants and animals and, in contrast with LGT, is thought to be only a minor evolutionary mechanism for bacteria. The near unprecedented evolutionary characteristics of O. tsutsugamushi, coupled with extensive intra-specific LGT, expand our present understanding of rapid bacterial evolutionary adaptive mechanisms.

@article {pmid30035711,
year = {2018},
author = {Fleshman, A and Mullins, K and Sahl, J and Hepp, C and Nieto, N and Wiggins, K and Hornstra, H and Kelly, D and Chan, TC and Phetsouvanh, R and Dittrich, S and Panyanivong, P and Paris, D and Newton, P and Richards, A and Pearson, T},
title = {Comparative pan-genomic analyses of Orientia tsutsugamushi reveal an exceptional model of bacterial evolution driving genomic diversity.},
journal = {Microbial genomics},
volume = {4},
number = {9},
pages = {},
doi = {10.1099/mgen.0.000199},
pmid = {30035711},
issn = {2057-5858},
support = {//Wellcome Trust/United Kingdom ; },
abstract = {Orientia tsutsugamushi, formerly Rickettsia tsutsugamushi, is an obligate intracellular pathogen that causes scrub typhus, an underdiagnosed acute febrile disease with high morbidity. Scrub typhus is transmitted by the larval stage (chigger) of Leptotrombidium mites and is irregularly distributed across endemic regions of Asia, Australia and islands of the western Pacific Ocean. Previous work to understand population genetics in O. tsutsugamushi has been based on sub-genomic sampling methods and whole-genome characterization of two genomes. In this study, we compared 40 genomes from geographically dispersed areas and confirmed patterns of extensive homologous recombination likely driven by transposons, conjugative elements and repetitive sequences. High rates of lateral gene transfer (LGT) among O. tsutsugamushi genomes appear to have effectively eliminated a detectable clonal frame, but not our ability to infer evolutionary relationships and phylogeographical clustering. Pan-genomic comparisons using 31 082 high-quality bacterial genomes from 253 species suggests that genomic duplication in O. tsutsugamushi is almost unparalleled. Unlike other highly recombinant species where the uptake of exogenous DNA largely drives genomic diversity, the pan-genome of O. tsutsugamushi is driven by duplication and divergence. Extensive gene innovation by duplication is most commonly attributed to plants and animals and, in contrast with LGT, is thought to be only a minor evolutionary mechanism for bacteria. The near unprecedented evolutionary characteristics of O. tsutsugamushi, coupled with extensive intra-specific LGT, expand our present understanding of rapid bacterial evolutionary adaptive mechanisms.},
}

RevDate: 2018-11-14

Zhou Z, Lundstrøm I, Tran-Dien A, et al (2018)

Pan-genome Analysis of Ancient and Modern Salmonella enterica Demonstrates Genomic Stability of the Invasive Para C Lineage for Millennia.

Current biology : CB, 28(15):2420-2428.e10.

Salmonella enterica serovar Paratyphi C causes enteric (paratyphoid) fever in humans. Its presentation can range from asymptomatic infections of the blood stream to gastrointestinal or urinary tract infection or even a fatal septicemia [1]. Paratyphi C is very rare in Europe and North America except for occasional travelers from South and East Asia or Africa, where the disease is more common [2, 3]. However, early 20th-century observations in Eastern Europe [3, 4] suggest that Paratyphi C enteric fever may once have had a wide-ranging impact on human societies. Here, we describe a draft Paratyphi C genome (Ragna) recovered from the 800-year-old skeleton (SK152) of a young woman in Trondheim, Norway. Paratyphi C sequences were recovered from her teeth and bones, suggesting that she died of enteric fever and demonstrating that these bacteria have long caused invasive salmonellosis in Europeans. Comparative analyses against modern Salmonella genome sequences revealed that Paratyphi C is a clade within the Para C lineage, which also includes serovars Choleraesuis, Typhisuis, and Lomita. Although Paratyphi C only infects humans, Choleraesuis causes septicemia in pigs and boar [5] (and occasionally humans), and Typhisuis causes epidemic swine salmonellosis (chronic paratyphoid) in domestic pigs [2, 3]. These different host specificities likely evolved in Europe over the last ∼4,000 years since the time of their most recent common ancestor (tMRCA) and are possibly associated with the differential acquisitions of two genomic islands, SPI-6 and SPI-7. The tMRCAs of these bacterial clades coincide with the timing of pig domestication in Europe [6].

@article {pmid30033331,
year = {2018},
author = {Zhou, Z and Lundstrøm, I and Tran-Dien, A and Duchêne, S and Alikhan, NF and Sergeant, MJ and Langridge, G and Fotakis, AK and Nair, S and Stenøien, HK and Hamre, SS and Casjens, S and Christophersen, A and Quince, C and Thomson, NR and Weill, FX and Ho, SYW and Gilbert, MTP and Achtman, M},
title = {Pan-genome Analysis of Ancient and Modern Salmonella enterica Demonstrates Genomic Stability of the Invasive Para C Lineage for Millennia.},
journal = {Current biology : CB},
volume = {28},
number = {15},
pages = {2420-2428.e10},
doi = {10.1016/j.cub.2018.05.058},
pmid = {30033331},
issn = {1879-0445},
support = {//Wellcome Trust/United Kingdom ; MR/M50161X/1//Medical Research Council/United Kingdom ; },
abstract = {Salmonella enterica serovar Paratyphi C causes enteric (paratyphoid) fever in humans. Its presentation can range from asymptomatic infections of the blood stream to gastrointestinal or urinary tract infection or even a fatal septicemia [1]. Paratyphi C is very rare in Europe and North America except for occasional travelers from South and East Asia or Africa, where the disease is more common [2, 3]. However, early 20th-century observations in Eastern Europe [3, 4] suggest that Paratyphi C enteric fever may once have had a wide-ranging impact on human societies. Here, we describe a draft Paratyphi C genome (Ragna) recovered from the 800-year-old skeleton (SK152) of a young woman in Trondheim, Norway. Paratyphi C sequences were recovered from her teeth and bones, suggesting that she died of enteric fever and demonstrating that these bacteria have long caused invasive salmonellosis in Europeans. Comparative analyses against modern Salmonella genome sequences revealed that Paratyphi C is a clade within the Para C lineage, which also includes serovars Choleraesuis, Typhisuis, and Lomita. Although Paratyphi C only infects humans, Choleraesuis causes septicemia in pigs and boar [5] (and occasionally humans), and Typhisuis causes epidemic swine salmonellosis (chronic paratyphoid) in domestic pigs [2, 3]. These different host specificities likely evolved in Europe over the last ∼4,000 years since the time of their most recent common ancestor (tMRCA) and are possibly associated with the differential acquisitions of two genomic islands, SPI-6 and SPI-7. The tMRCAs of these bacterial clades coincide with the timing of pig domestication in Europe [6].},
}

Background: Shewanella strains are important dissimilatory metal-reducing bacteria which are widely distributed in diverse habitats. Despite efforts to genomically characterize Shewanella, knowledge of the molecular components, functional information and evolutionary patterns remain lacking, especially for their compatibility in the metal-reducing pathway. The increasing number of genome sequences of Shewanella strains offers a basis for pan-genome studies.

Results: A comparative pan-genome analysis was conducted to study genomic diversity and evolutionary relationships among 24 Shewanella strains. Results revealed an open pan-genome of 13,406 non-redundant genes and a core-genome of 1878 non-redundant genes. Selective pressure acted on the invariant members of core genome, in which purifying selection drove evolution in the housekeeping mechanisms. Shewanella strains exhibited extensive genome variability, with high levels of gene gain and loss during the evolution, which affected variable gene sets and facilitated the rapid evolution. Additionally, genes related to metal reduction were diversely distributed in Shewanella strains and evolved under purifying selection, which highlighted the basic conserved functionality and specificity of respiratory systems.

Conclusions: The diversity of genes present in the accessory and specific genomes of Shewanella strains indicates that each strain uses different strategies to adapt to diverse environments. Horizontal gene transfer is an important evolutionary force in shaping Shewanella genomes. Purifying selection plays an important role in the stability of the core-genome and also drives evolution in mtr-omc cluster of different Shewanella strains.

Results: A comparative pan-genome analysis was conducted to study genomic diversity and evolutionary relationships among 24 Shewanella strains. Results revealed an open pan-genome of 13,406 non-redundant genes and a core-genome of 1878 non-redundant genes. Selective pressure acted on the invariant members of core genome, in which purifying selection drove evolution in the housekeeping mechanisms. Shewanella strains exhibited extensive genome variability, with high levels of gene gain and loss during the evolution, which affected variable gene sets and facilitated the rapid evolution. Additionally, genes related to metal reduction were diversely distributed in Shewanella strains and evolved under purifying selection, which highlighted the basic conserved functionality and specificity of respiratory systems.

Conclusions: The diversity of genes present in the accessory and specific genomes of Shewanella strains indicates that each strain uses different strategies to adapt to diverse environments. Horizontal gene transfer is an important evolutionary force in shaping Shewanella genomes. Purifying selection plays an important role in the stability of the core-genome and also drives evolution in mtr-omc cluster of different Shewanella strains.},
}

RevDate: 2018-11-14

Collins FWJ, Mesa-Pereira B, O'Connor PM, et al (2018)

Reincarnation of Bacteriocins From the Lactobacillus Pangenomic Graveyard.

Frontiers in microbiology, 9:1298.

Bacteria commonly produce narrow spectrum bacteriocins as a means of inhibiting closely related species competing for similar resources in an environment. The increasing availability of genomic data means that it is becoming easier to identify bacteriocins encoded within genomes. Often, however, the presence of bacteriocin genes in a strain does not always translate into biological antimicrobial activity. For example, when analysing the Lactobacillus pangenome we identified strains encoding ten pediocin-like bacteriocin structural genes which failed to display inhibitory activity. Nine of these bacteriocins were novel whilst one was identified as the previously characterized bacteriocin "penocin A." The composition of these bacteriocin operons varied between strains, often with key components missing which are required for bacteriocin production, such as dedicated bacteriocin transporters and accessory proteins. In an effort to functionally express these bacteriocins, the structural genes for the ten pediocin homologs were cloned alongside the dedicated pediocin PA-1 transporter in both Escherichia coli and Lactobacillus paracasei heterologous hosts. Each bacteriocin was cloned with its native leader sequence and as a fusion protein with the pediocin PA-1 leader sequence. Several of these bacteriocins displayed a broader spectrum of inhibition than the original pediocin PA-1. We show how potentially valuable bacteriocins can easily be "reincarnated" from in silico data and produced in vitro despite often lacking the necessary accompanying machinery. Moreover, the study demonstrates how genomic datasets such as the Lactobacilus pangenome harbor a potential "arsenal" of antimicrobial activity with the possibility of being activated when expressed in more genetically amenable hosts.

@article {pmid30013519,
year = {2018},
author = {Collins, FWJ and Mesa-Pereira, B and O'Connor, PM and Rea, MC and Hill, C and Ross, RP},
title = {Reincarnation of Bacteriocins From the Lactobacillus Pangenomic Graveyard.},
journal = {Frontiers in microbiology},
volume = {9},
number = {},
pages = {1298},
doi = {10.3389/fmicb.2018.01298},
pmid = {30013519},
issn = {1664-302X},
abstract = {Bacteria commonly produce narrow spectrum bacteriocins as a means of inhibiting closely related species competing for similar resources in an environment. The increasing availability of genomic data means that it is becoming easier to identify bacteriocins encoded within genomes. Often, however, the presence of bacteriocin genes in a strain does not always translate into biological antimicrobial activity. For example, when analysing the Lactobacillus pangenome we identified strains encoding ten pediocin-like bacteriocin structural genes which failed to display inhibitory activity. Nine of these bacteriocins were novel whilst one was identified as the previously characterized bacteriocin "penocin A." The composition of these bacteriocin operons varied between strains, often with key components missing which are required for bacteriocin production, such as dedicated bacteriocin transporters and accessory proteins. In an effort to functionally express these bacteriocins, the structural genes for the ten pediocin homologs were cloned alongside the dedicated pediocin PA-1 transporter in both Escherichia coli and Lactobacillus paracasei heterologous hosts. Each bacteriocin was cloned with its native leader sequence and as a fusion protein with the pediocin PA-1 leader sequence. Several of these bacteriocins displayed a broader spectrum of inhibition than the original pediocin PA-1. We show how potentially valuable bacteriocins can easily be "reincarnated" from in silico data and produced in vitro despite often lacking the necessary accompanying machinery. Moreover, the study demonstrates how genomic datasets such as the Lactobacilus pangenome harbor a potential "arsenal" of antimicrobial activity with the possibility of being activated when expressed in more genetically amenable hosts.},
}

The advent of high throughput sequencing (HTS) technologies raises a major concern about storage and transmission of data produced by these technologies. In particular, large-scale sequencing projects generate an unprecedented volume of genomic sequences ranging from tens to several thousands of genomes per species. These collections contain highly similar and redundant sequences, also known as pangenomes. The ideal way to represent and transfer pangenomes is through compression. A number of HTS-specific compression tools have been developed to reduce the storage and communication costs of HTS data, yet none of them is designed to process a pangenome. In this article, we present dynamic alignment-free and reference-free read compression (DARRC), a new alignment-free and reference-free compression method. It addresses the problem of pangenome compression by encoding the sequences of a pangenome as a guided de Bruijn graph. The novelty of this method is its ability to incrementally update DARRC archives with new genome sequences without full decompression of the archive. DARRC can compress both single-end and paired-end read sequences of any length using all symbols of the IUPAC nucleotide code. On a large Pseudomonas aeruginosa data set, our method outperforms all other tested tools. It provides a 30% compression ratio improvement in single-end mode compared with the best performing state-of-the-art HTS-specific compression method in our experiments.

@article {pmid30011247,
year = {2018},
author = {Holley, G and Wittler, R and Stoye, J and Hach, F},
title = {Dynamic Alignment-Free and Reference-Free Read Compression.},
journal = {Journal of computational biology : a journal of computational molecular cell biology},
volume = {25},
number = {7},
pages = {825-836},
doi = {10.1089/cmb.2018.0068},
pmid = {30011247},
issn = {1557-8666},
abstract = {The advent of high throughput sequencing (HTS) technologies raises a major concern about storage and transmission of data produced by these technologies. In particular, large-scale sequencing projects generate an unprecedented volume of genomic sequences ranging from tens to several thousands of genomes per species. These collections contain highly similar and redundant sequences, also known as pangenomes. The ideal way to represent and transfer pangenomes is through compression. A number of HTS-specific compression tools have been developed to reduce the storage and communication costs of HTS data, yet none of them is designed to process a pangenome. In this article, we present dynamic alignment-free and reference-free read compression (DARRC), a new alignment-free and reference-free compression method. It addresses the problem of pangenome compression by encoding the sequences of a pangenome as a guided de Bruijn graph. The novelty of this method is its ability to incrementally update DARRC archives with new genome sequences without full decompression of the archive. DARRC can compress both single-end and paired-end read sequences of any length using all symbols of the IUPAC nucleotide code. On a large Pseudomonas aeruginosa data set, our method outperforms all other tested tools. It provides a 30% compression ratio improvement in single-end mode compared with the best performing state-of-the-art HTS-specific compression method in our experiments.},
}

RevDate: 2018-07-14

Driscoll CB, Meyer KA, Šulčius S, et al (2018)

A closely-related clade of globally distributed bloom-forming cyanobacteria within the Nostocales.

Harmful algae, 77:93-107.

In order to better understand the relationships among current Nostocales cyanobacterial blooms, eight genomes were sequenced from cultured isolates or from environmental metagenomes of recent planktonic Nostocales blooms. Phylogenomic analysis of publicly available sequences placed the new genomes among a group of 15 genomes from four continents in a distinct ADA clade (Anabaena/Dolichospermum/Aphanizomenon) within the Nostocales. This clade contains four species-level groups, two of which include members with both Anabaena-like and Aphanizomenon flos-aquae-like morphology. The genomes contain many repetitive genetic elements and a sizable pangenome, in which ABC-type transporters are highly represented. Alongside common core genes for photosynthesis, the differentiation of N2-fixing heterocysts, and the uptake and incorporation of the major nutrients P, N and S, we identified several gene pathways in the pangenome that may contribute to niche partitioning. Genes for problematic secondary metabolites-cyanotoxins and taste-and-odor compounds-were sporadically present, as were other polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS) gene clusters. By contrast, genes predicted to encode the ribosomally generated bacteriocin peptides were found in all genomes.

@article {pmid30005805,
year = {2018},
author = {Driscoll, CB and Meyer, KA and Šulčius, S and Brown, NM and Dick, GJ and Cao, H and Gasiūnas, G and Timinskas, A and Yin, Y and Landry, ZC and Otten, TG and Davis, TW and Watson, SB and Dreher, TW},
title = {A closely-related clade of globally distributed bloom-forming cyanobacteria within the Nostocales.},
journal = {Harmful algae},
volume = {77},
number = {},
pages = {93-107},
doi = {10.1016/j.hal.2018.05.009},
pmid = {30005805},
issn = {1878-1470},
abstract = {In order to better understand the relationships among current Nostocales cyanobacterial blooms, eight genomes were sequenced from cultured isolates or from environmental metagenomes of recent planktonic Nostocales blooms. Phylogenomic analysis of publicly available sequences placed the new genomes among a group of 15 genomes from four continents in a distinct ADA clade (Anabaena/Dolichospermum/Aphanizomenon) within the Nostocales. This clade contains four species-level groups, two of which include members with both Anabaena-like and Aphanizomenon flos-aquae-like morphology. The genomes contain many repetitive genetic elements and a sizable pangenome, in which ABC-type transporters are highly represented. Alongside common core genes for photosynthesis, the differentiation of N2-fixing heterocysts, and the uptake and incorporation of the major nutrients P, N and S, we identified several gene pathways in the pangenome that may contribute to niche partitioning. Genes for problematic secondary metabolites-cyanotoxins and taste-and-odor compounds-were sporadically present, as were other polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS) gene clusters. By contrast, genes predicted to encode the ribosomally generated bacteriocin peptides were found in all genomes.},
}

Covering alignment problems arise from recent developments in genomics; so called pan-genome graphs are replacing reference genomes, and advances in haplotyping enable full content of diploid genomes to be used as basis of sequence analysis. In this paper, we show that the computational complexity will change for natural extensions of alignments to pan-genome representations and to diploid genomes. More broadly, our approach can also be seen as a minimal extension of sequence alignment to labelled directed acyclic graphs (labeled DAGs). Namely, we show that finding a covering alignment of two labeled DAGs is NP-hard even on binary alphabets. A covering alignment asks for two paths (red) and (green) in DAG and two paths (red) and (green) in DAG that cover the nodes of the graphs and maximize the sum of the global alignment scores: , where is the concatenation of labels on the path P. Pair-wise alignment of haplotype sequences forming a diploid chromosome can be converted to a two-path coverable labelled DAG, and then the covering alignment models the similarity of two diploids over arbitrary recombinations. Reduction to the other direction shows that problem NP-hard on alphabets of size 3.

@article {pmid29994032,
year = {2018},
author = {Rizzi, R and Cairo, M and Makinen, V and Tomescu, AI and Valenzuela, D},
title = {Hardness of Covering Alignment: Phase Transition in Post-Sequence Genomics.},
journal = {IEEE/ACM transactions on computational biology and bioinformatics},
volume = {},
number = {},
pages = {},
doi = {10.1109/TCBB.2018.2831691},
pmid = {29994032},
issn = {1557-9964},
abstract = {Covering alignment problems arise from recent developments in genomics; so called pan-genome graphs are replacing reference genomes, and advances in haplotyping enable full content of diploid genomes to be used as basis of sequence analysis. In this paper, we show that the computational complexity will change for natural extensions of alignments to pan-genome representations and to diploid genomes. More broadly, our approach can also be seen as a minimal extension of sequence alignment to labelled directed acyclic graphs (labeled DAGs). Namely, we show that finding a covering alignment of two labeled DAGs is NP-hard even on binary alphabets. A covering alignment asks for two paths (red) and (green) in DAG and two paths (red) and (green) in DAG that cover the nodes of the graphs and maximize the sum of the global alignment scores: , where is the concatenation of labels on the path P. Pair-wise alignment of haplotype sequences forming a diploid chromosome can be converted to a two-path coverable labelled DAG, and then the covering alignment models the similarity of two diploids over arbitrary recombinations. Reduction to the other direction shows that problem NP-hard on alphabets of size 3.},
}

Here, we present new theory and law of longevity intended to evaluate fundamental factors that control lifespan. This theory is based on the fact that genes affecting host organism longevity are represented by subpopulations: genes of host eukaryotic cells, commensal microbiota, and non-living genetic elements. Based on Tetz's theory of longevity, we propose that lifespan and aging are defined by the accumulation of alterations over all genes of macroorganism and microbiome and the non-living genetic elements associated with them. Tetz's law of longevity states that longevity is limited by the accumulation of alterations to the limiting value that is not compatible with life. Based on theory and law, we also propose a novel model to calculate several parameters, including the rate of aging and the remaining lifespan of individuals. We suggest that this theory and model have explanatory and predictive potential to eukaryotic organisms, allowing the influence of diseases, medication, and medical procedures to be re-examined in relation to longevity. Such estimates also provide a framework to evaluate new fundamental aspects that control aging and lifespan.

@article {pmid29978435,
year = {2018},
author = {Tetz, G and Tetz, V},
title = {Tetz's theory and law of longevity.},
journal = {Theory in biosciences = Theorie in den Biowissenschaften},
volume = {137},
number = {2},
pages = {145-154},
doi = {10.1007/s12064-018-0267-4},
pmid = {29978435},
issn = {1611-7530},
abstract = {Here, we present new theory and law of longevity intended to evaluate fundamental factors that control lifespan. This theory is based on the fact that genes affecting host organism longevity are represented by subpopulations: genes of host eukaryotic cells, commensal microbiota, and non-living genetic elements. Based on Tetz's theory of longevity, we propose that lifespan and aging are defined by the accumulation of alterations over all genes of macroorganism and microbiome and the non-living genetic elements associated with them. Tetz's law of longevity states that longevity is limited by the accumulation of alterations to the limiting value that is not compatible with life. Based on theory and law, we also propose a novel model to calculate several parameters, including the rate of aging and the remaining lifespan of individuals. We suggest that this theory and model have explanatory and predictive potential to eukaryotic organisms, allowing the influence of diseases, medication, and medical procedures to be re-examined in relation to longevity. Such estimates also provide a framework to evaluate new fundamental aspects that control aging and lifespan.},
}

Lactobacillus plantarum is a lactic acid bacterium that promotes animal intestinal health as a probiotic and is found in a wide variety of habitats. Here, we investigated the genomic features of different clusters of L. plantarum strains via pan-genomic analysis. We compared the genomes of 108 L. plantarum strains that were available from the NCBI GenBank database. These genomes were 2.9-3.7 Mbp in size and 44-45% in G+C content. A total of 8,847 orthologs were collected, and 1,709 genes were identified to be shared as core genes by all the strains analyzed. On the basis of SNPs from the core genes, 108 strains were clustered into five major groups (G1-G5) that are different from previous reports and are not clearly associated with habitats. Analysis of group-specific enriched or depleted genes revealed that G1 and G2 were rich in genes for carbohydrate utilization (L-arabinose, L-rhamnose, and fructooligosaccharides) and that G3, G4, and G5 possessed more genes for the restriction-modification system and MazEF toxin-antitoxin. These results indicate that there are critical differences in gene content and survival strategies among genetically clustered L. plantarum strains, regardless of habitats.

@article {pmid29975997,
year = {2018},
author = {Choi, S and Jin, GD and Park, J and You, I and Kim, EB},
title = {Pan-Genomics of Lactobacillus plantarum Revealed Group-Specific Genomic Profiles without Habitat Association.},
journal = {Journal of microbiology and biotechnology},
volume = {28},
number = {8},
pages = {1352-1359},
doi = {10.4014/jmb.1803.03029},
pmid = {29975997},
issn = {1738-8872},
mesh = {Animals ; Databases, Genetic ; Ecosystem ; Genes, Bacterial/genetics ; Genome, Bacterial/*genetics ; *Genomics ; Lactobacillus plantarum/*classification/*genetics ; Molecular Sequence Annotation ; *Phylogeny ; Polymorphism, Single Nucleotide/genetics ; },
abstract = {Lactobacillus plantarum is a lactic acid bacterium that promotes animal intestinal health as a probiotic and is found in a wide variety of habitats. Here, we investigated the genomic features of different clusters of L. plantarum strains via pan-genomic analysis. We compared the genomes of 108 L. plantarum strains that were available from the NCBI GenBank database. These genomes were 2.9-3.7 Mbp in size and 44-45% in G+C content. A total of 8,847 orthologs were collected, and 1,709 genes were identified to be shared as core genes by all the strains analyzed. On the basis of SNPs from the core genes, 108 strains were clustered into five major groups (G1-G5) that are different from previous reports and are not clearly associated with habitats. Analysis of group-specific enriched or depleted genes revealed that G1 and G2 were rich in genes for carbohydrate utilization (L-arabinose, L-rhamnose, and fructooligosaccharides) and that G3, G4, and G5 possessed more genes for the restriction-modification system and MazEF toxin-antitoxin. These results indicate that there are critical differences in gene content and survival strategies among genetically clustered L. plantarum strains, regardless of habitats.},
}

Whole genome comparison of two Starmerella bacillaris strains with other wine yeasts uncovers genes involved in modulating important winemaking traits.

FEMS yeast research, 18(7):.

Starmerella bacillaris is an osmotolerant yeast with interesting winemaking traits such as low-ethanol and high-glycerol production, previously considered as wine spoilage and recently proposed to improve the sensory quality of wine. This is the first work performing a whole-genome analysis of the variants identified by comparing two S. bacillaris strains (PAS13 and FRI751). Additionally, an extensive search for orthologous genes against Saccharomyces and non-Saccharomyces yeasts produced a detailed reconstruction of the pan-genome for yeast species used in winemaking. Starmerella bacillaris PAS13 was able to produce 36% more glycerol than S. bacillaris FRI751 without increasing ethanol level over 5% (v/v). Orthologous genes revealed new insights in the response to osmotic stress determined by the mitogen-activated protein kinase (MAPK) from S. bacillaris strains. The comparison between the two S. bacillaris genomes revealed 33 771 high-quality variants that were ranked considering their predicted impact on gene functions. Furthermore, analysis of structural variations in the genome revealed five translocations. The absence of some transcriptional factors involved in the regulation of GPD (glycerol-3-phosphate dehydrogenase), like the protein kinases YpK1p and YpK2p, and the identification of a tandem duplication increasing the GPP1 (glycerol-3-phosphate phosphatase) gene copy number suggest a remarkably different regulation of the glycerol pathway for S. bacillaris in comparison to S. cerevisiae.

@article {pmid29961804,
year = {2018},
author = {Lemos Junior, WJF and da Silva Duarte, V and Treu, L and Campanaro, S and Nadai, C and Giacomini, A and Corich, V},
title = {Whole genome comparison of two Starmerella bacillaris strains with other wine yeasts uncovers genes involved in modulating important winemaking traits.},
journal = {FEMS yeast research},
volume = {18},
number = {7},
pages = {},
doi = {10.1093/femsyr/foy069},
pmid = {29961804},
issn = {1567-1364},
abstract = {Starmerella bacillaris is an osmotolerant yeast with interesting winemaking traits such as low-ethanol and high-glycerol production, previously considered as wine spoilage and recently proposed to improve the sensory quality of wine. This is the first work performing a whole-genome analysis of the variants identified by comparing two S. bacillaris strains (PAS13 and FRI751). Additionally, an extensive search for orthologous genes against Saccharomyces and non-Saccharomyces yeasts produced a detailed reconstruction of the pan-genome for yeast species used in winemaking. Starmerella bacillaris PAS13 was able to produce 36% more glycerol than S. bacillaris FRI751 without increasing ethanol level over 5% (v/v). Orthologous genes revealed new insights in the response to osmotic stress determined by the mitogen-activated protein kinase (MAPK) from S. bacillaris strains. The comparison between the two S. bacillaris genomes revealed 33 771 high-quality variants that were ranked considering their predicted impact on gene functions. Furthermore, analysis of structural variations in the genome revealed five translocations. The absence of some transcriptional factors involved in the regulation of GPD (glycerol-3-phosphate dehydrogenase), like the protein kinases YpK1p and YpK2p, and the identification of a tandem duplication increasing the GPP1 (glycerol-3-phosphate phosphatase) gene copy number suggest a remarkably different regulation of the glycerol pathway for S. bacillaris in comparison to S. cerevisiae.},
}

Motivation: Antimicrobial resistance (AMR) is becoming a huge problem in both developed and developing countries, and identifying strains resistant or susceptible to certain antibiotics is essential in fighting against antibiotic-resistant pathogens. Whole-genome sequences have been collected for different microbial strains in order to identify crucial characteristics that allow certain strains to become resistant to antibiotics; however, a global inspection of the gene content responsible for AMR activities remains to be done.

Results: We propose a pan-genome-based approach to characterize antibiotic-resistant microbial strains and test this approach on the bacterial model organism Escherichia coli. By identifying core and accessory gene clusters and predicting AMR genes for the E. coli pan-genome, we not only showed that certain classes of genes are unevenly distributed between the core and accessory parts of the pan-genome but also demonstrated that only a portion of the identified AMR genes belong to the accessory genome. Application of machine learning algorithms to predict whether specific strains were resistant to antibiotic drugs yielded the best prediction accuracy for the set of AMR genes within the accessory part of the pan-genome, suggesting that these gene clusters were most crucial to AMR activities in E. coli. Selecting subsets of AMR genes for different antibiotic drugs based on a genetic algorithm (GA) achieved better prediction performances than the gene sets established in the literature, hinting that the gene sets selected by the GA may warrant further analysis in investigating more details about how E. coli fight against antibiotics.

Supplementary information: Supplementary data are available at Bioinformatics online.

Results: We propose a pan-genome-based approach to characterize antibiotic-resistant microbial strains and test this approach on the bacterial model organism Escherichia coli. By identifying core and accessory gene clusters and predicting AMR genes for the E. coli pan-genome, we not only showed that certain classes of genes are unevenly distributed between the core and accessory parts of the pan-genome but also demonstrated that only a portion of the identified AMR genes belong to the accessory genome. Application of machine learning algorithms to predict whether specific strains were resistant to antibiotic drugs yielded the best prediction accuracy for the set of AMR genes within the accessory part of the pan-genome, suggesting that these gene clusters were most crucial to AMR activities in E. coli. Selecting subsets of AMR genes for different antibiotic drugs based on a genetic algorithm (GA) achieved better prediction performances than the gene sets established in the literature, hinting that the gene sets selected by the GA may warrant further analysis in investigating more details about how E. coli fight against antibiotics.

Supplementary information: Supplementary data are available at Bioinformatics online.},
}

In recent years, an increasing number of Campylobacter species have been associated with human gastrointestinal (GI) diseases including gastroenteritis, inflammatory bowel disease, and colorectal cancer. Campylobacter concisus, an oral commensal historically linked to gingivitis and periodontitis, has been increasingly detected in the lower GI tract. In the present study, we generated robust genome sequence data from C. concisus strains and undertook a comprehensive pangenome assessment to identify C. concisus virulence properties and to explain potential adaptations acquired while residing in specific ecological niche(s) of the GI tract. Genomes of 53 new C. concisus strains were sequenced, assembled, and annotated including 36 strains from gastroenteritis patients, 13 strains from Crohn's disease patients and four strains from colitis patients (three collagenous colitis and one lymphocytic colitis). When compared with previous published sequences, strains clustered into two main groups/genomospecies (GS) with phylogenetic clustering explained neither by disease phenotype nor sample location. Paired oral/faecal isolates, from the same patient, indicated that there are few genetic differences between oral and gut isolates which suggests that gut isolates most likely reflect oral strain relocation. Type IV and VI secretion systems genes, genes known to be important for pathogenicity in the Campylobacter genus, were present in the genomes assemblies, with 82% containing Type VI secretion system genes. Our findings indicate that C. concisus strains are genetically diverse, and the variability in bacterial secretion system content may play an important role in their virulence potential.

Robbins holds BS, MS, and PhD degrees in the life sciences. He served
as a tenured faculty member in the Zoology and Biological Science
departments at Michigan State University. He is currently exploring
the intersection between genomics, microbial ecology, and biodiversity
— an area that promises to transform our understanding of the
biosphere.

Educator

Robbins has extensive experience in college-level education: At MSU he
taught introductory biology, genetics, and population genetics. At
JHU, he was an instructor for a special course on biological database
design. At FHCRC, he team-taught a graduate-level course on the
history of genetics. At Bellevue College he taught medical
informatics.

Administrator

Robbins has been involved in science administration at both the
federal and the institutional levels. At NSF he was a program officer
for database activities in the life sciences, at DOE he was a program
officer for information infrastructure in the human genome project. At
the Fred Hutchinson Cancer Research Center, he served as a vice
president for fifteen years.

Technologist

Robbins has been involved with information technology since writing
his first Fortran program as a college student. At NSF he was the first
program officer for database activities in the life sciences. At JHU
he held an appointment in the CS department and served as director of
the informatics core for the Genome Data Base. At the FHCRC he was VP
for Information Technology.

Publisher

While still at Michigan State, Robbins started his first publishing
venture, founding a small company that addressed the short-run
publishing needs of instructors in very large undergraduate classes.
For more than 20 years, Robbins has been operating The Electronic Scholarly Publishing Project,
a web site dedicated to the digital publishing of critical works in
science, especially classical genetics.

Speaker

Robbins is well-known for his speaking abilities and is often called
upon to provide keynote or plenary addresses at international
meetings. For example, in July, 2012, he gave a well-received keynote address at the
Global Biodiversity Informatics Congress, sponsored by GBIF and held
in Copenhagen. The slides from that talk can be seen
HERE.

Facilitator

Robbins is a skilled meeting facilitator.
He prefers a participatory approach, with
part of the meeting involving dynamic breakout groups, created by the
participants in real time: (1) individuals propose breakout groups;
(2) everyone signs up for one (or more) groups; (3) the groups
with the most interested parties then meet, with reports from each
group presented and discussed in a subsequent plenary session.

Designer

Robbins has been engaged with photography and design since the 1960s,
when he worked for a professional photography laboratory. He now
prefers digital photography and tools for their precision and
reproducibility. He designed his first web site more than 20 years
ago and he personally designed and implemented this web site.
He engages in graphic design as a hobby.

Reprints and preprints of publications, slide presentations,
instructional materials, and data compilations written or
prepared by Robert Robbins. Most papers deal with
computational biology, genome informatics, using information
technology to support biomedical research, and related matters.

ResearchGate is a social networking site for scientists and
researchers to share papers, ask and answer questions, and
find collaborators. According to a study by
Nature
and an
article in
Times Higher Education
, it is the largest academic
social network in terms of active users.