Abstract:

Chlamydia trachomatis (Ctr) is a bacterial pathogen that causes ocular, urogenital and lymph system infections in humans. It is highly abundant and among its serovars, E, F and D are most prevalent in sexually transmitted disease. However, the number of publicly available genome sequences of the serovars E and F, and thereby our knowledge about the molecular architecture of these serovars, is low. Here we sequenced the genomes of six E and F clinical isolates and one E lab strain, in order to study the genetic variance in these serovars. As observed before, the genomic variation inside the Ctr genomes is very low and the phylogenetic placement in comparison to publicly available genomes is as expected by ompA gene serotyping. However, we observed a large InDel carrying four to five open reading frames in one clinical E sample and in the E lab strain. We have also observed substantial variation on nucleotide and amino acid levels, especially in membrane proteins and secreted proteins. Furthermore, these two groups of proteins are also target for recombination events. One clinical F isolate was genetically heterogeneous and revealed the highest differences on nucleotide level in the pmpE gene.

Coral-associated viral communities show high levels of diversity and host auxiliary functions.

Abstract:

Stony corals (Scleractinia) are marine invertebrates that form the foundation and framework upon which tropical reefs are built. The coral animal associates with a diverse microbiome comprised of dinoflagellate algae and other protists, bacteria, archaea, fungi and viruses. Using a metagenomics approach, we analysed the DNA and RNA viral assemblages of seven coral species from the central Great Barrier Reef (GBR), demonstrating that tailed bacteriophages of the Caudovirales dominate across all species examined, and ssDNA viruses, notably the Microviridae, are also prevalent. Most sequences with matches to eukaryotic viruses were assigned to six viral families, including four Nucleocytoplasmic Large DNA Viruses (NCLDVs) families: Iridoviridae, Phycodnaviridae, Mimiviridae, and Poxviridae, as well as Retroviridae and Polydnaviridae. Contrary to previous findings, Herpesvirales were rare in these GBR corals. Sequences of a ssRNA virus with similarities to the dinornavirus, Heterocapsa circularisquama ssRNA virus of the Alvernaviridae that infects free-living dinoflagellates, were observed in three coral species. We also detected viruses previously undescribed from the coral holobiont, including a virus that targets fungi associated with the coral species Acropora tenuis. Functional analysis of the assembled contigs indicated a high prevalence of latency-associated genes in the coral-associated viral assemblages, several host-derived auxiliary metabolic genes (AMGs) for photosynthesis (psbA, psbD genes encoding the photosystem II D1 and D2 proteins respectively), as well as potential nematocyst toxins and antioxidants (genes encoding green fluorescent-like chromoprotein). This study expands the currently limited knowledge on coral-associated viruses by characterising viral composition and function across seven GBR coral species.

Abstract:

Neisseria meningitidis is the causative agent of cerebrospinal meningitis and that of a rapidly progressing fatal septic shock known as purpura fulminans. Meningococcemia is characterized by bacterial adhesion to human endothelial cells of the microvessels. Host specificity has hampered studies on the role of blood vessels colonization in N. meningitidis associated pathogenesis. In this work, using a humanized model of SCID mice allowing the study of bacterial adhesion to human cells in an in vivo context we demonstrate that meningococcal colonization of human blood vessels is a prerequisite to the establishment of sepsis and lethality. To identify the molecular pathways involved in bacterial virulence, we performed transposon insertion site sequencing (Tn-seq) in vivo. Our results demonstrate that 36% of the genes that are important for growth in the blood of mice are dispensable when bacteria colonize human blood vessels, suggesting that human endothelial cells lining the blood vessels are feeding niches for N. meningitidis in vivo. Altogether, our work proposes a new paradigm for meningococcal virulence in which colonization of blood vessels is associated with metabolic adaptation and sustained bacteremia responsible for sepsis and subsequent lethality.

Abstract:

Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.

Sulfonolipids as novel metabolite markers of Alistipes and Odoribacter affected by high-fat diets.

Abstract:

The gut microbiota generates a huge pool of unknown metabolites, and their identification and characterization is a key challenge in metabolomics. However, there are still gaps on the studies of gut microbiota and their chemical structures. In this investigation, an unusual class of bacterial sulfonolipids (SLs) is detected in mouse cecum, which was originally found in environmental microbes. We have performed a detailed molecular level characterization of this class of lipids by combining high-resolution mass spectrometry and liquid chromatography analysis. Eighteen SLs that differ in their capnoid and fatty acid chain compositions were identified. The SL called "sulfobacin B" was isolated, characterized, and was significantly increased in mice fed with high-fat diets. To reveal bacterial producers of SLs, metagenome analysis was acquired and only two bacterial genera, i.e., Alistipes and Odoribacter, were revealed to be responsible for their production. This knowledge enables explaining a part of the molecular complexity introduced by microbes to the mammalian gastrointestinal tract and can be used as chemotaxonomic evidence in gut microbiota.

Abstract:

We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.

Variant profiling of evolving prokaryotic populations.

Abstract:

Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing reads. Recent advances in NGS technologies improved the speed and coverage and thus allowed for deep sequencing of bacterial populations. This facilitates the quantitative assessment of genomic heterogeneity, including low frequency alleles or haplotypes. However, false positive variant predictions due to sequencing errors and mapping artifacts of short reads need to be prevented. We therefore created VarCap, a workflow for the reliable prediction of different types of variants even at low frequencies. In order to predict SNPs, InDels and structural variations, we evaluated the sensitivity and accuracy of different software tools using synthetic read data. The results suggested that the best sensitivity could be reached by a union of different tools, however at the price of increased false positives. We identified possible reasons for false predictions and used this knowledge to improve the accuracy by post-filtering the predicted variants according to properties such as frequency, coverage, genomic environment/localization and co-localization with other variants. We observed that best precision was achieved by using an intersection of at least two tools per variant. This resulted in the reliable prediction of variants above a minimum relative abundance of 2%. VarCap is designed for being routinely used within experimental evolution experiments or for clinical diagnostics. The detected variants are reported as frequencies within a VCF file and as a graphical overview of the distribution of the different variant/allele/haplotype frequencies. The source code of VarCap is available at https://github.com/ma2o/VarCap. In order to provide this workflow to a broad community, we implemeted VarCap on a Galaxy webserver, which is accessible at http://galaxy.csb.univie.ac.at.

Lifestyle and Horizontal Gene Transfer-Mediated Evolution of Mucispirillum schaedleri, a Core Member of the Murine Gut Microbiota.

Abstract:

Mucispirillum schaedleri is an abundant inhabitant of the intestinal mucus layer of rodents and other animals and has been suggested to be a pathobiont, a commensal that plays a role in disease. In order to gain insights into its lifestyle, we analyzed the genome and transcriptome of M. schaedleri ASF 457 and performed physiological experiments to test traits predicted by its genome. Although described as a mucus inhabitant, M. schaedleri has limited capacity for degrading host-derived mucosal glycans and other complex polysaccharides. Additionally, M. schaedleri reduces nitrate and expresses systems for scavenging oxygen and reactive oxygen species in vivo, which may account for its localization close to the mucosal tissue and expansion during inflammation. Also of note, M. schaedleri harbors a type VI secretion system and putative effector proteins and can modify gene expression in mucosal tissue, suggesting intimate interactions with its host and a possible role in inflammation. The M. schaedleri genome has been shaped by extensive horizontal gene transfer, primarily from intestinal Epsilon- and Deltaproteobacteria, indicating that horizontal gene transfer has played a key role in defining its niche in the gut ecosystem. IMPORTANCE Shifts in gut microbiota composition have been associated with intestinal inflammation, but it remains unclear whether inflammation-associated bacteria are commensal or detrimental to their host. Here, we studied the lifestyle of the gut bacterium Mucispirillum schaedleri, which is associated with inflammation in widely used mouse models. We found that M. schaedleri has specialized systems to handle oxidative stress during inflammation. Additionally, it expresses secretion systems and effector proteins and can modify the mucosal gene expression of its host. This suggests that M. schaedleri undergoes intimate interactions with its host and may play a role in inflammation. The insights presented here aid our understanding of how commensal gut bacteria may be involved in altering susceptibility to disease.

Development of a human vasopressin V1a-receptor antagonist from an evolutionary-related insect neuropeptide.

Abstract:

Characterisation of G protein-coupled receptors (GPCR) relies on the availability of a toolbox of ligands that selectively modulate different functional states of the receptors. To uncover such molecules, we explored a unique strategy for ligand discovery that takes advantage of the evolutionary conservation of the 600-million-year-old oxytocin/vasopressin signalling system. We isolated the insect oxytocin/vasopressin orthologue inotocin from the black garden ant (Lasius niger), identified and cloned its cognate receptor and determined its pharmacological properties on the insect and human oxytocin/vasopressin receptors. Subsequently, we identified a functional dichotomy: inotocin activated the insect inotocin and the human vasopressin V1b receptors, but inhibited the human V1aR. Replacement of Arg8 of inotocin by D-Arg8 led to a potent, stable and competitive V1aR-antagonist ([D-Arg8]-inotocin) with a 3,000-fold binding selectivity for the human V1aR over the other three subtypes, OTR, V1bR and V2R. The Arg8/D-Arg8 ligand-pair was further investigated to gain novel insights into the oxytocin/vasopressin peptide-receptor interaction, which led to the identification of key residues of the receptors that are important for ligand functionality and selectivity. These observations could play an important role for development of oxytocin/vasopressin receptor modulators that would enable clear distinction of the physiological and pathological responses of the individual receptor subtypes.

Unraveling the microbial processes of black band disease in corals through integrated genomics.

Abstract:

Coral disease outbreaks contribute to the ongoing degradation of reef ecosystems, however, microbial mechanisms underlying the onset and progression of most coral diseases are poorly understood. Black band disease (BBD) manifests as a cyanobacterial-dominated microbial mat that destroys coral tissues as it rapidly spreads over coral colonies. To elucidate BBD pathogenesis, we apply a comparative metagenomic and metatranscriptomic approach to identify taxonomic and functional changes within microbial lesions during in-situ development of BBD from a comparatively benign stage termed cyanobacterial patches. Results suggest that photosynthetic CO2-fixation in Cyanobacteria substantially enhances productivity of organic matter within the lesion during disease development. Photosynthates appear to subsequently promote sulfide-production by Deltaproteobacteria, facilitating the major virulence factor of BBD. Interestingly, our metagenome-enabled transcriptomic analysis reveals that BBD-associated cyanobacteria have a putative mechanism that enables them to adapt to higher levels of hydrogen sulfide within lesions, underpinning the pivotal roles of the dominant cyanobacterium within the polymicrobial lesions during the onset of BBD. The current study presents sequence-based evidence derived from whole microbial communities that unravel the mechanism of development and progression of BBD.

Abstract:

In our mouse model, gastric acid-suppression is associated with antigen-specific IgE and anaphylaxis development. We repeatedly observed non-responder animals protected from food allergy. Here, we aimed to analyse reasons for this protection. Ten out of 64 mice, subjected to oral ovalbumin (OVA) immunizations under gastric acid-suppression, were non-responders without OVA-specific IgE or IgG1 elevation, indicating protection from allergy. In these non-responders, allergen challenges confirmed reduced antigen uptake and lack of anaphylactic symptoms, while in allergic mice high levels of mouse mast-cell protease-1 and a body temperature reduction, indicative for anaphylaxis, were determined. Upon OVA stimulation, significantly lower IL-4, IL-5, IL-10 and IL-13 levels were detected in non-responders, while IL-22 was significantly higher. Comparison of fecal microbiota revealed differences of bacterial communities on single bacterial Operational-Taxonomic-Unit level between the groups, indicating protection from food allergy being associated with a distinct microbiota composition in a non-responding phenotype in this mouse model.

NVT: a fast and simple tool for the assessment of RNA-seq normalization strategies.

Abstract:

Measuring differential gene expression is a common task in the analysis of RNA-Seq data. To identify differentially expressed genes between two samples, it is crucial to normalize the datasets. While multiple normalization methods are available, all of them are based on certain assumptions that may or may not be suitable for the type of data they are applied on. Researchers therefore need to select an adequate normalization strategy for each RNA-Seq experiment. This selection includes exploration of different normalization methods as well as their comparison. Methods that agree with each other most likely represent realistic assumptions under the particular experimental conditions.
We developed the NVT package, which provides a fast and simple way to analyze and evaluate multiple normalization methods via visualization and representation of correlation values, based on a user-defined set of uniformly expressed genes.
The R package is freely available under https://github.com/Edert/NVT CONTACT: thomas.rattei@univie.ac.atSupplementary information: Supplementary data are available at Bioinformatics online.

Abstract:

Neisseria meningitidis is a leading cause of bacterial meningitis and septicemia, affecting infants and adults worldwide. N. meningitidis is also a common inhabitant of the human nasopharynx and, as such, is highly adapted to its niche. During bacteremia, N. meningitidis gains access to the blood compartment, where it adheres to endothelial cells of blood vessels and causes dramatic vascular damage. Colonization of the nasopharyngeal niche and communication with the different human cell types is a major issue of the N. meningitidis life cycle that is poorly understood. Here, highly saturated random transposon insertion libraries of N. meningitidis were engineered, and the fitness of mutations during routine growth and that of colonization of endothelial and epithelial cells in a flow device were assessed in a transposon insertion site sequencing (Tn-seq) analysis. This allowed the identification of genes essential for bacterial growth and genes specifically required for host cell colonization. In addition, after having identified the small noncoding RNAs (sRNAs) located in intergenic regions, the phenotypes associated with mutations in those sRNAs were defined. A total of 383 genes and 8 intergenic regions containing sRNA candidates were identified to be essential for growth, while 288 genes and 33 intergenic regions containing sRNA candidates were found to be specifically required for host cell colonization.
Meningococcal meningitis is a common cause of meningitis in infants and adults. Neisseria meningitidis (meningococcus) is also a commensal bacterium of the nasopharynx and is carried by 3 to 30% of healthy humans. Under some unknown circumstances, N. meningitidis is able to invade the bloodstream and cause either meningitis or a fatal septicemia known as purpura fulminans. The onset of symptoms is sudden, and death can follow within hours. Although many meningococcal virulence factors have been identified, the mechanisms that allow the bacterium to switch from the commensal to pathogen state remain unknown. Therefore, we used a Tn-seq strategy coupled to high-throughput DNA sequencing technologies to find genes for proteins used by N. meningitidis to specifically colonize epithelial cells and primary brain endothelial cells. We identified 383 genes and 8 intergenic regions containing sRNAs essential for growth and 288 genes and 33 intergenic regions containing sRNAs required specifically for host cell colonization.

Abstract:

The rapidly growing number of available prokaryotic genome sequences requires fully automated and high-quality software solutions for their initial and re-annotation. Here we present ConsPred, a prokaryotic genome annotation framework that performs intrinsic gene predictions, homology searches, predictions of non-coding genes as well as CRISPR repeats and integrates all evidence into a consensus annotation. ConsPred achieves comprehensive, high-quality annotations based on rules and priorities, similar to decision-making in manual curation and avoids conflicting predictions. Parameters controlling the annotation process are configurable by the user. ConsPred has been used in the institutions of the authors for longer than 5 years and can easily be extended and adapted to specific needs.
The ConsPred algorithm for producing a consensus from the varying scores of multiple gene prediction programs approaches manual curation in accuracy. Its rule-based approach for choosing final predictions avoids overriding previous manual curations.
ConsPred is implemented in Java, Perl and Shell and is freely available under the Creative Commons license as a stand-alone in-house pipeline or as an Amazon Machine Image for cloud computing, see https://sourceforge.net/projects/conspred/.thomas.rattei@univie.ac.atSupplementary information: Supplementary data are available at Bioinformatics online.

HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts.

Abstract:

Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments.

Abstract:

The diverse microbial communities in agricultural biogas fermenters are assumed to be well adapted for the anaerobic transformation of plant biomass to methane. Compared to natural systems, biogas reactors are limited in their hydrolytic potential. The reasons for this are not understood.
In this paper, we show that a typical industrial biogas reactor fed with maize silage, cow manure, and chicken manure has relatively lower hydrolysis rates compared to feces samples from herbivores. We provide evidence that on average, 2.5 genes encoding cellulolytic GHs/Mbp were identified in the biogas fermenter compared to 3.8 in the elephant feces and 3.2 in the cow rumen data sets. The ratio of genes coding for cellulolytic GH enzymes affiliated with the Firmicutes versus the Bacteroidetes was 2.8:1 in the biogas fermenter compared to 1:1 in the elephant feces and 1.4:1 in the cow rumen sample. Furthermore, RNA-Seq data indicated that highly transcribed cellulases in the biogas fermenter were four times more often affiliated with the Firmicutes compared to the Bacteroidetes, while an equal distribution of these enzymes was observed in the elephant feces sample.
Our data indicate that a relatively lower abundance of bacteria affiliated with the phylum of Bacteroidetes and, to some extent, Fibrobacteres is associated with a decreased richness of predicted lignocellulolytic enzymes in biogas fermenters. This difference can be attributed to a partial lack of genes coding for cellulolytic GH enzymes derived from bacteria which are affiliated with the Fibrobacteres and, especially, the Bacteroidetes. The partial deficiency of these genes implies a potentially important limitation in the biogas fermenter with regard to the initial hydrolysis of biomass. Based on these findings, we speculate that increasing the members of Bacteroidetes and Fibrobacteres in biogas fermenters will most likely result in an increased hydrolytic performance.

High definition for systems biology of microbial communities: metagenomics gets genome-centric and strain-resolved.

Abstract:

The systems biology of microbial communities, organismal communities inhabiting all ecological niches on earth, has in recent years been strongly facilitated by the rapid development of experimental, sequencing and data analysis methods. Novel experimental approaches and binning methods in metagenomics render the semi-automatic reconstructions of near-complete genomes of uncultivable bacteria possible, while advances in high-resolution amplicon analysis allow for efficient and less biased taxonomic community characterization. This will also facilitate predictive modeling approaches, hitherto limited by the low resolution of metagenomic data. In this review, we pinpoint the most promising current developments in metagenomics. They facilitate microbial systems biology towards a systemic understanding of mechanisms in microbial communities with scopes of application in many areas of our daily life.

Abstract:

The Spanish slug, Arion vulgaris, is considered one of the hundred most invasive species in Central Europe. The immense and very successful adaptation and spreading of A. vulgaris suggest that it developed highly effective mechanisms to deal with infections and natural predators. Current transcriptomic and proteomic studies on gastropods have been restricted mainly to marine and freshwater gastropods. No transcriptomic or proteomic study on A. vulgaris has been carried out so far, and in the current study, the first transcriptomic database from adult specimen of A. vulgaris is reported. To facilitate and enable proteomics in this non-model organism, a mRNA-derived protein database was constructed for protein identification. A gel-based proteomic approach was used to obtain the first generation of a comprehensive slug mantle proteome. A total of 2128 proteins were unambiguously identified; 48 proteins represent novel proteins with no significant homology in NCBI non-redundant database. Combined transcriptomic and proteomic analysis revealed an extensive repertoire of novel proteins with a role in innate immunity including many associated pattern recognition, effector proteins and cytokine-like proteins. The number and diversity in gene families encoding lectins point to a complex defense system, probably as a result of adaptation to a pathogen-rich environment. These results are providing a fundamental and important resource for subsequent studies on molluscs as well as for putative antimicrobial compounds for drug discovery and biomedical applications.

Abstract:

The stomach bacterium Helicobacter pylori is one of the most prevalent human pathogens. It has dispersed globally with its human host, resulting in a distinct phylogeographic pattern that can be used to reconstruct both recent and ancient human migrations. The extant European population of H. pylori is known to be a hybrid between Asian and African bacteria, but there exist different hypotheses about when and where the hybridization took place, reflecting the complex demographic history of Europeans. Here, we present a 5300-year-old H. pylori genome from a European Copper Age glacier mummy. The "Iceman" H. pylori is a nearly pure representative of the bacterial population of Asian origin that existed in Europe before hybridization, suggesting that the African population arrived in Europe within the past few thousand years.

EffectiveDB-updates and novel features for a better annotation of bacterial secreted proteins and Type III, IV, VI secretion systems.

Abstract:

Protein secretion systems play a key role in the interaction of bacteria and hosts. EffectiveDB (http://effectivedb.org) contains pre-calculated predictions of bacterial secreted proteins and of intact secretion systems. Here we describe a major update of the database, which was previously featured in the NAR Database Issue. EffectiveDB bundles various tools to recognize Type III secretion signals, conserved binding sites of Type III chaperones, Type IV secretion peptides, eukaryotic-like domains and subcellular targeting signals in the host. Beyond the analysis of arbitrary protein sequence collections, the new release of EffectiveDB also provides a 'genome-mode', in which protein sequences from nearly complete genomes or metagenomic bins can be screened for the presence of three important secretion systems (Type III, IV, VI). EffectiveDB contains pre-calculated predictions for currently 1677 bacterial genomes from the EggNOG 4.0 database and for additional bacterial genomes from NCBI RefSeq. The new, user-friendly and informative web portal offers a submission tool for running the EffectiveDB prediction tools on user-provided data.

probeBase-an online resource for rRNA-targeted oligonucleotide probes and primers: new features 2016.

Abstract:

probeBase http://www.probebase.net is a manually maintained and curated database of rRNA-targeted oligonucleotide probes and primers. Contextual information and multiple options for evaluating in silico hybridization performance against the most recent rRNA sequence databases are provided for each oligonucleotide entry, which makes probeBase an important and frequently used resource for microbiology research and diagnostics. Here we present a major update of probeBase, which was last featured in the NAR Database Issue 2007. This update describes a complete remodeling of the database architecture and environment to accommodate computationally efficient access. Improved search functions, sequence match tools and data output now extend the opportunities for finding suitable hierarchical probe sets that target an organism or taxon at different taxonomic levels. To facilitate the identification of complementary probe sets for organisms represented by short rRNA sequence reads generated by amplicon sequencing or metagenomic analysis with next generation sequencing technologies such as Illumina and IonTorrent, we introduce a novel tool that recovers surrogate near full-length rRNA sequences for short query sequences and finds matching oligonucleotides in probeBase.

Abstract:

eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing overall annotation coverage from 67% to 72%. The functional annotations of OGs have been expanded to also provide Gene Ontology terms, KEGG pathways and SMART/Pfam domains for each group. Moreover, eggNOG now provides pairwise orthology relationships within OGs based on analysis of phylogenetic trees. We have also incorporated a framework for quickly mapping novel sequences to OGs based on precomputed HMM profiles. Finally, eggNOG version 4.5 incorporates a novel data set spanning 2605 viral OGs, covering 5228 proteins from 352 viral proteomes. All data are accessible for bulk downloading, as a web-service, and through a completely redesigned web interface. The new access points provide faster searches and a number of new browsing and visualization capabilities, facilitating the needs of both experts and less experienced users. eggNOG v4.5 is available at http://eggnog.embl.de.

Abstract:

Recent studies posit a reciprocal dependency between the microbiomes associated with humans and indoor environments. However, none of these metagenome surveys has considered the viability of constituent microorganisms when inferring impact on human health.
Reported here are the results of a viability-linked metagenomics assay, which (1) unveil a remarkably complex community profile for bacteria, fungi, and viruses and (2) bolster the detection of underrepresented taxa by eliminating biases resulting from extraneous DNA. This approach enabled, for the first time ever, the elucidation of viral genomes from a cleanroom environment. Upon comparing the viable biomes and distribution of phylotypes within a cleanroom and adjoining (uncontrolled) gowning enclosure, the rigorous cleaning and stringent control countermeasures of the former were observed to select for a greater presence of anaerobes and spore-forming microflora. Sequence abundance and correlation analyses suggest that the viable indoor microbiome is influenced by both the human microbiome and the surrounding ecosystem(s).
The findings of this investigation constitute the literature's first ever account of the indoor metagenome derived from DNA originating solely from the potential viable microbial population. Results presented in this study should prove valuable to the conceptualization and experimental design of future studies on indoor microbiomes aimed at inferring impact on human health.

Abstract:

Nitrification, the oxidation of ammonia via nitrite to nitrate, has always been considered to be a two-step process catalysed by chemolithoautotrophic microorganisms oxidizing either ammonia or nitrite. No known nitrifier carries out both steps, although complete nitrification should be energetically advantageous. This functional separation has puzzled microbiologists for a century. Here we report on the discovery and cultivation of a completely nitrifying bacterium from the genus Nitrospira, a globally distributed group of nitrite oxidizers. The genome of this chemolithoautotrophic organism encodes the pathways both for ammonia and nitrite oxidation, which are concomitantly activated during growth by ammonia oxidation to nitrate. Genes affiliated with the phylogenetically distinct ammonia monooxygenase and hydroxylamine dehydrogenase genes of Nitrospira are present in many environments and were retrieved on Nitrospira-contigs in new metagenomes from engineered systems. These findings fundamentally change our picture of nitrification and point to completely nitrifying Nitrospira as key components of nitrogen-cycling microbial communities.

Prediction of microbial phenotypes based on comparative genomics.

Abstract:

The accessibility of almost complete genome sequences of uncultivable microbial species from metagenomes necessitates computational methods predicting microbial phenotypes solely based on genomic data. Here we investigate how comparative genomics can be utilized for the prediction of microbial phenotypes. The PICA framework facilitates application and comparison of different machine learning techniques for phenotypic trait prediction. We have improved and extended PICA's support vector machine plug-in and suggest its applicability to large-scale genome databases and incomplete genome sequences.

Abstract:

It is widely accepted that bacterial endophytes actively colonize plants, interact with their host, and frequently show beneficial effects on plant growth and health. However, the mechanisms of plant-endophyte communication and bacterial adaption to the plant environment are still poorly understood. Here, whole-transcriptome sequencing of B. phytofirmans PsJN colonizing potato (Solanum tuberosum L.) plants was used to analyze in planta gene activity and the response of strain PsJN to plant stress. The transcriptome of PsJN colonizing in vitro potato plants showed a broad array of functionalities encoded in the genome of strain PsJN. Transcripts upregulated in response to plant drought stress were mainly involved in transcriptional regulation, cellular homeostasis, and the detoxification of reactive oxygen species, indicating an oxidative stress response in PsJN. Genes with modulated expression included genes for extracytoplasmatic function (ECF) group IV sigma factors. These cell surface signaling elements allow bacteria to sense changing environmental conditions and to adjust their metabolism accordingly. TaqMan quantitative PCR (TaqMan-qPCR) was performed to identify ECF sigma factors in PsJN that were activated in response to plant stress. Six ECF sigma factor genes were expressed in PsJN colonizing potato plants. The expression of one ECF sigma factor was upregulated whereas that of another one was downregulated in a plant genotype-specific manner when the plants were stressed. Collectively, our study results indicate that endophytic B. phytofirmans PsJN cells are active inside plants. Moreover, the activity of strain PsJN is affected by plant drought stress; it senses plant stress signals and adjusts its gene expression accordingly.
In recent years, plant growth-promoting endophytes have received steadily growing interest as an inexpensive alternative to resource-consuming agrochemicals in sustainable agriculture. Even though promising effects are recurrently observed under controlled conditions, these are rarely reproducible in the field or show undesirably strong variations. Obviously, a better understanding of endophyte activities in plants and the influence of plant physiology on these activities is needed to develop more-successful application strategies. So far, research has focused mainly on analyzing the plant response to bacterial inoculants. This prompted us to study the gene expression of the endophyte Burkholderia phytofirmans PsJN in potato plants. We found that endophytic PsJN cells express a wide array of genes and pathways, pointing to high metabolic activity inside plants. Moreover, the strain senses changes in the plant physiology due to plant stress and adjusts its gene expression pattern to cope with and adapt to the altered conditions.

Internalization of Pseudomonas aeruginosa Strain PAO1 into Epithelial Cells Is Promoted by Interaction of a T6SS Effector with the Microtubule Network.

Abstract:

Invasion of nonphagocytic cells through rearrangement of the actin cytoskeleton is a common immune evasion mechanism used by most intracellular bacteria. However, some pathogens modulate host microtubules as well by a still poorly understood mechanism. In this study, we aim at deciphering the mechanisms by which the opportunistic bacterial pathogen Pseudomonas aeruginosa invades nonphagocytic cells, although it is considered mainly an extracellular bacterium. Using confocal microscopy and immunofluorescence, we show that the evolved VgrG2b effector of P. aeruginosa strain PAO1 is delivered into epithelial cells by a type VI secretion system, called H2-T6SS, involving the VgrG2a component. An in vivo interactome of VgrG2b in host cells allows the identification of microtubule components, including the γ-tubulin ring complex (γTuRC), a multiprotein complex catalyzing microtubule nucleation, as the major host target of VgrG2b. This interaction promotes a microtubule-dependent internalization of the bacterium since colchicine and nocodazole, two microtubule-destabilizing drugs, prevent VgrG2b-mediated P. aeruginosa entry even if the invasion still requires actin. We further validate our findings by demonstrating that the type VI injection step can be bypassed by ectopic production of VgrG2b inside target cells prior to infection. Moreover, such uncoupling between VgrG2b injection and bacterial internalization also reveals that they constitute two independent steps. With VgrG2b, we provide the first example of a bacterial protein interacting with the γTuRC. Our study offers key insight into the mechanism of self-promoting invasion of P. aeruginosa into human cells via a directed and specific effector-host protein interaction.
Innate immunity and specifically professional phagocytic cells are key determinants in the ability of the host to control P. aeruginosa infection. However, among various virulence strategies, including attack, this opportunistic bacterial pathogen is able to avoid host clearance by triggering its own internalization in nonphagocytic cells. We previously showed that a protein secretion/injection machinery, called the H2 type VI secretion system (H2-T6SS), promotes P. aeruginosa uptake by epithelial cells. Here we investigate which H2-T6SS effector enables P. aeruginosa to enter nonphagocytic cells. We show that VgrG2b is delivered by the H2-T6SS machinery into epithelial cells, where it interacts with microtubules and, more particularly, with the γ-tubulin ring complex (γTuRC) known as the microtubule-nucleating center. This interaction precedes a microtubule- and actin-dependent internalization of P. aeruginosa. We thus discovered an unprecedented target for a bacterial virulence factor since VgrG2b constitutes, to our knowledge, the first example of a bacterial protein interacting with the γTuRC.

The genomes of closely related Pantoea ananatis maize seed endophytes having different effects on the host plant differ in secretion system genes and mobile genetic elements.

Abstract:

The seed as a habitat for microorganisms is as yet under-explored and has quite distinct characteristics as compared to other vegetative plant tissues. In this study, we investigated three closely related P. ananatis strains (named S6, S7, and S8), which were isolated from maize seeds of healthy plants. Plant inoculation experiments revealed that each of these strains exhibited a different phenotype ranging from weak pathogenic (S7), commensal (S8), to a beneficial, growth-promoting effect (S6) in maize. We performed a comparative genomics analysis in order to find genetic determinants responsible for the differences observed. Recent studies provided exciting insight into the genetic drivers of niche adaption and functional diversification of the genus Pantoea. However, we report here for the first time on the analysis of P. ananatis strains colonizing the same ecological niche but showing distinct interaction strategies with the host plant. Our comparative analysis revealed that genomes of these three strains are highly similar. However, genomic differences in genes encoding protein secretion systems and putative effectors, and transposase/integrases/phage related genes could be observed.

The Intraperitoneal Transcriptome of the Opportunistic Pathogen Enterococcus faecalis in Mice.

Abstract:

Enterococcus faecalis is a Gram-positive lactic acid intestinal opportunistic bacterium with virulence potential. For a better understanding of the adapation of this bacterium to the host conditions, we performed a transcriptome analysis of bacteria isolated from an infection site (mouse peritonitis) by RNA-sequencing. We identified a total of 211 genes with significantly higher transcript levels and 157 repressed genes. Our in vivo gene expression database reflects well the infection process since genes encoding important virulence factors like cytolysin, gelatinase or aggregation substance as well as stress response proteins, are significantly induced. Genes encoding metabolic activities are the second most abundant in vivo induced genes demonstrating that the bacteria are metabolically active and adapt to the special nutrient conditions of the host. α- and β- glucosides seem to be important substrates for E. faecalis inside the host. Compared to laboratory conditions, the flux through the upper part of glycolysis seems to be reduced and more carbon may enter the pentose phosphate pathway. This may reflect the need of the bacteria under infection conditions to produce more reducing power for biosynthesis. Another important substrate is certainly glycerol since both pathways of glycerol catabolism are strongly induced. Strongly in vivo induced genes should be important for the infection process. This assumption has been verified in a virulence test using well characterized mutants affected in glycerol metabolism. This showed indeed that mutants unable to metabolize this sugar alcohol are affected in organ colonisation in a mouse model.

Abstract:

Chlamydia pneumoniae (Cpn) are obligate intracellular bacteria that cause acute infections of the upper and lower respiratory tract and have been implicated in chronic inflammatory diseases. Although of significant clinical relevance, complete genome sequences of only four clinical Cpn strains have been obtained. All of them were isolated from the respiratory tract and shared more than 99% sequence identity. Here we investigate genetic differences on the whole-genome level that are related to Cpn tissue tropism and pathogenicity.
We have sequenced the genomes of 18 clinical isolates from different anatomical sites (e.g. lung, blood, coronary arteries) of diseased patients, and one animal isolate. In total 1,363 SNP loci and 184 InDels have been identified in the genomes of all clinical Cpn isolates. These are distributed throughout the whole chlamydial genome and enriched in highly variable regions. The genomes show clear evidence of recombination in at least one potential region but no phage insertions. The tyrP gene was always encoded as single copy in all vascular isolates. Phylogenetic reconstruction revealed distinct evolutionary lineages containing primarily non-respiratory Cpn isolates. In one of these, clinical isolates from coronary arteries and blood monocytes were closely grouped together. They could be distinguished from all other isolates by characteristic nsSNPs in genes involved in RB to EB transition, inclusion membrane formation, bacterial stress response and metabolism.
This study substantially expands the genomic data of Cpn and elucidates its evolutionary history. The translation of the observed Cpn genetic differences into biological functions and the prediction of novel pathogen-oriented diagnostic strategies have to be further explored.

Abstract:

In this study, we investigated the impact of soil pH on the diversity and abundance of archaeal ammonia oxidizers in 27 different forest soils across Germany. DNA was extracted from topsoil samples, the amoA gene, encoding ammonia monooxygenase, was amplified; and the amplicons were sequenced using a 454-based pyrosequencing approach. As expected, the ratio of archaeal (AOA) to bacterial (AOB) ammonia oxidizers' amoA genes increased sharply with decreasing soil pH. The diversity of AOA differed significantly between sites with ultra-acidic soil pH (<3.5) and sites with higher pH values. The major OTUs from soil samples with low pH could be detected at each site with a soil pH <3.5 but not at sites with pH >4.5, regardless of geographic position and vegetation. These OTUs could be related to the Nitrosotalea group 1.1 and the Nitrososphaera subcluster 7.2, respectively, and showed significant similarities to OTUs described from other acidic environments. Conversely, none of the major OTUs typical of sites with a soil pH >4.6 could be found in the ultra- and extreme acidic soils. Based on a comparison with the amoA gene sequence data from a previous study performed on agricultural soils, we could clearly show that the development of AOA communities in soils with ultra-acidic pH (<3.5) is mainly triggered by soil pH and is not influenced significantly by the type of land use, the soil type, or the geographic position of the site, which was observed for sites with acido-neutral soil pH.

Abstract:

The energy metabolism of essential microbial guilds in the biogeochemical sulfur cycle is based on a DsrAB-type dissimilatory (bi)sulfite reductase that either catalyzes the reduction of sulfite to sulfide during anaerobic respiration of sulfate, sulfite and organosulfonates, or acts in reverse during sulfur oxidation. Common use of dsrAB as a functional marker showed that dsrAB richness in many environments is dominated by novel sequence variants and collectively represents an extensive, largely uncharted sequence assemblage. Here, we established a comprehensive, manually curated dsrAB/DsrAB database and used it to categorize the known dsrAB diversity, reanalyze the evolutionary history of dsrAB and evaluate the coverage of published dsrAB-targeted primers. Based on a DsrAB consensus phylogeny, we introduce an operational classification system for environmental dsrAB sequences that integrates established taxonomic groups with operational taxonomic units (OTUs) at multiple phylogenetic levels, ranging from DsrAB enzyme families that reflect reductive or oxidative DsrAB types of bacterial or archaeal origin, superclusters, uncultured family-level lineages to species-level OTUs. Environmental dsrAB sequences constituted at least 13 stable family-level lineages without any cultivated representatives, suggesting that major taxa of sulfite/sulfate-reducing microorganisms have not yet been identified. Three of these uncultured lineages occur mainly in marine environments, while specific habitat preferences are not evident for members of the other 10 uncultured lineages. In summary, our publically available dsrAB/DsrAB database, the phylogenetic framework, the multilevel classification system and a set of recommended primers provide a necessary foundation for large-scale dsrAB ecology studies with next-generation sequencing methods.

Abstract:

Nitrospira are chemolithoautotrophic nitrite-oxidizing bacteria that catalyze the second step of nitrification in most oxic habitats and are important for excess nitrogen removal from sewage in wastewater treatment plants (WWTPs). To date, little is known about their diversity and ecological niche partitioning within complex communities. In this study, the fine-scale community structure and function of Nitrospira was analyzed in two full-scale WWTPs as model ecosystems. In Nitrospira-specific 16S rRNA clone libraries retrieved from each plant, closely related phylogenetic clusters (16S rRNA identities between clusters ranged from 95.8% to 99.6%) within Nitrospira lineages I and II were found. Newly designed probes for fluorescence in situ hybridization (FISH) allowed the specific detection of several of these clusters, whose coexistence in the WWTPs was shown for prolonged periods of several years. In situ ecophysiological analyses based on FISH, relative abundance and spatial arrangement quantification, as well as microautoradiography revealed functional differences of these Nitrospira clusters regarding the preferred nitrite concentration, the utilization of formate as substrate and the spatial coaggregation with ammonia-oxidizing bacteria as symbiotic partners. Amplicon pyrosequencing of the nxrB gene, which encodes subunit beta of nitrite oxidoreductase of Nitrospira, revealed in one of the WWTPs as many as 121 species-level nxrB operational taxonomic units with highly uneven relative abundances in the amplicon library. These results show a previously unrecognized high diversity of Nitrospira in engineered systems, which is at least partially linked to niche differentiation and may have important implications for process stability.

Abstract:

Subsurface microbial life contributes significantly to biogeochemical cycling, yet it remains largely uncharacterized, especially its archaeal members. This 'microbial dark matter' has been explored by recent studies that were, however, mostly based on DNA sequence information only. Here, we use diverse techniques including ultrastuctural analyses to link genomics to biology for the SM1 Euryarchaeon lineage, an uncultivated group of subsurface archaea. Phylogenomic analyses reveal this lineage to belong to a widespread group of archaea that we propose to classify as a new euryarchaeal order ('Candidatus Altiarchaeales'). The representative, double-membraned species 'Candidatus Altiarchaeum hamiconexum' has an autotrophic metabolism that uses a not-yet-reported Factor420-free reductive acetyl-CoA pathway, confirmed by stable carbon isotopic measurements of archaeal lipids. Our results indicate that this lineage has evolved specific metabolic and structural features like nano-grappling hooks empowering this widely distributed archaeon to predominate anaerobic groundwater, where it may represent an important carbon dioxide sink.

Abstract:

PCR-ribotyping, a typing method based on size variation in 16S-23S rRNA intergenic spacer region (ISR), has been used widely for molecular epidemiological investigations of C. difficile infections. In the present study, we describe the sequence diversity of ISRs from 43 C. difficile strains, representing different PCR-ribotypes and suggest homologous recombination as a possible mechanism driving the evolution of 16S-23S rRNA ISRs. ISRs of 45 different lengths (ranging from 185 bp to 564 bp) were found among 458 ISRs. All ISRs could be described with one of the 22 different structural groups defined by the presence or absence of different sequence modules; tRNAAla genes and different combinations of spacers of different lengths (33 bp, 53 bp or 20 bp) and 9 bp direct repeats separating the spacers. The ISR structural group, in most cases, coincided with the sequence length. ISRs that were of the same lengths had also very similar nucleotide sequence, suggesting that ISRs were not suitable for discriminating between different strains based only on the ISR sequence. Despite large variations in the length, the alignment of ISR sequences, based on the primary sequence and secondary structure information, revealed many conserved regions which were mainly involved in maturation of pre-rRNA. Phylogenetic analysis of the ISR alignment yielded strong evidence for intra- and inter-homologous recombination which could be one of the mechanisms driving the evolution of C. difficile 16S-23S ISRs. The modular structure of the ISR, the high sequence similarities of ISRs of the same sizes and the presence of homologous recombination also suggest that different copies of C. difficile 16S-23S rRNA ISR are evolving in concert.

Abstract:

A phylogenetic and metagenomic study of elephant feces samples (derived from a three-weeks-old and a six-years-old Asian elephant) was conducted in order to describe the microbiota inhabiting this large land-living animal. The microbial diversity was examined via 16S rRNA gene analysis. We generated more than 44,000 GS-FLX+454 reads for each animal. For the baby elephant, 380 operational taxonomic units (OTUs) were identified at 97% sequence identity level; in the six-years-old animal, close to 3,000 OTUs were identified, suggesting high microbial diversity in the older animal. In both animals most OTUs belonged to Bacteroidetes and Firmicutes. Additionally, for the baby elephant a high number of Proteobacteria was detected. A metagenomic sequencing approach using Illumina technology resulted in the generation of 1.1 Gbp assembled DNA in contigs with a maximum size of 0.6 Mbp. A KEGG pathway analysis suggested high metabolic diversity regarding the use of polymers and aromatic and non-aromatic compounds. In line with the high phylogenetic diversity, a surprising and not previously described biodiversity of glycoside hydrolase (GH) genes was found. Enzymes of 84 GH families were detected. Polysaccharide utilization loci (PULs), which are found in Bacteroidetes, were highly abundant in the dataset; some of these comprised cellulase genes. Furthermore the highest coverage for GH5 and GH9 family enzymes was detected for Bacteroidetes, suggesting that bacteria of this phylum are mainly responsible for the degradation of cellulose in the Asian elephant. Altogether, this study delivers insight into the biomass conversion by one of the largest plant-fed and land-living animals.

Characterization of 19 new microsatellite loci for the Omani barb Garra barreimiae from 454 sequences.

Abstract:

Garra barreimiae is a cyprinid fish from the southeastern Arabian Peninsula, which inhabits regularly desiccating wadis and survives in isolated ponds or underground. In 1984 a cave-dwelling population was found in the Al Hoota cave system and previous genetic analyses revealed some differentiation with limited gene flow between the surface populations and the cave population. Since no suitable markers are available for evaluation of gene flow between the cave population and the adjacent surface populations, we focused on designing and establishing novel microsatellite markers from next generation sequencing data.
19 microsatellite markers containing di- and tetranucleotide simple sequence repeats were developed from 454 sequences. Forty-four individuals from two surface populations (Wadi Al Falahi and Misfat Al Abriyeen) of G. barreimiae (sampling permission number 13/2012, export permission number 29/2012) were used for analyses and characterization of the loci. On average, the number of alleles per locus is 7.6 (range: 2-20). Two markers displayed indication of linkage disequilibrium in both populations (DL6X, 9XNC). Significant deviation from Hardy-Weinberg equilibrium was observed at four loci in the Misfat Al Abriyeen population (2PUM, 88CM, 1EHE, 3Z7M) and at two loci in the Wadi Al Falahi population (QLIM, 3 N43). Three of the microsatellite loci were significant for null alleles in one of the two populations (Misfat Al Abriyeen: CJHG; Wadi Al Falahi: PH8A, 3ROZ). Expected and observed heterozygosities ranged from 0 to 95.0% respectively from 0 to 95.8% (Wadi Al Falahi) and from 0 to 89.1% respectively from 0 to 95.0% (Misfat Al Abriyeen). Fourteen of these markers were successfully cross-amplified in G. rufa.
This 19 microsatellite loci provide a useful tool to understand the structure and genetic differences of populations. Moreover, these markers will help to evaluate species delimitation in G. barreimiae and potentially even in related species.

Massive expansion of Ubiquitination-related gene families within the Chlamydiae.

Abstract:

Gene loss, gain, and transfer play an important role in shaping the genomes of all organisms; however, the interplay of these processes in isolated populations, such as in obligate intracellular bacteria, is less understood. Despite a general trend towards genome reduction in these microbes, our phylogenomic analysis of the phylum Chlamydiae revealed that within the family Parachlamydiaceae, gene family expansions have had pronounced effects on gene content. We discovered that the largest gene families within the phylum are the result of rapid gene birth-and-death evolution. These large gene families are comprised of members harboring eukaryotic-like ubiquitination-related domains, such as F-box and BTB-box domains, marking the largest reservoir of these proteins found among bacteria. A heterologous type III secretion system assay suggests that these proteins function as effectors manipulating the host cell. The large disparity in copy number of members in these families between closely related organisms suggests that nonadaptive processes might contribute to the evolution of these gene families. Gene birth-and-death evolution in concert with genomic drift might represent a previously undescribed mechanism by which isolated bacterial populations diversify.

Metagenomic analysis reveals presence of Treponema denticola in a tissue biopsy of the Iceman.

Abstract:

Ancient hominoid genome studies can be regarded by definition as metagenomic analyses since they represent a mixture of both hominoid and microbial sequences in an environment. Here, we report the molecular detection of the oral spirochete Treponema denticola in ancient human tissue biopsies of the Iceman, a 5,300-year-old Copper Age natural ice mummy. Initially, the metagenomic data of the Iceman's genomic survey was screened for bacterial ribosomal RNA (rRNA) specific reads. Through ranking the reads by abundance a relatively high number of rRNA reads most similar to T. denticola was detected. Mapping of the metagenome sequences against the T. denticola genome revealed additional reads most similar to this opportunistic pathogen. The DNA damage pattern of specifically mapped reads suggests an ancient origin of these sequences. The haematogenous spread of bacteria of the oral microbiome often reported in the recent literature could already explain the presence of metagenomic reads specific for T. denticola in the Iceman's bone biopsy. We extended, however, our survey to an Iceman gingival tissue sample and a mouth swab sample and could thereby detect T. denticola and Porphyrimonas gingivalis, another important member of the human commensal oral microflora. Taken together, this study clearly underlines the opportunity to detect disease-associated microorganisms when applying metagenomics-enabled approaches on datasets of ancient human remains.

Abstract:

A combinatory approach using metabolomics and gut microbiome analysis techniques was performed to unravel the nature and specificity of metabolic profiles related to gut ecology in obesity. This study focused on gut and liver metabolomics of two different mouse strains, the C57BL/6J (C57J) and the C57BL/6N (C57N) fed with high-fat diet (HFD) for 3 weeks, causing diet-induced obesity in C57N, but not in C57J mice. Furthermore, a 16S-ribosomal RNA comparative sequence analysis using 454 pyrosequencing detected significant differences between the microbiome of the two strains on phylum level for Firmicutes, Deferribacteres and Proteobacteria that propose an essential role of the microbiome in obesity susceptibility. Gut microbial and liver metabolomics were followed by a combinatory approach using Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS) and ultra performance liquid chromatography time of tlight MS/MS with subsequent multivariate statistical analysis, revealing distinctive host and microbial metabolome patterns between the C57J and the C57N strain. Many taurine-conjugated bile acids (TBAs) were significantly elevated in the cecum and decreased in liver samples from the C57J phenotype likely displaying different energy utilization behavior by the bacterial community and the host. Furthermore, several metabolite groups could specifically be associated with the C57N phenotype involving fatty acids, eicosanoids and urobilinoids. The mass differences based metabolite network approach enabled to extend the range of known metabolites to important bile acids (BAs) and novel taurine conjugates specific for both strains. In summary, our study showed clear alterations of the metabolome in the gastrointestinal tract and liver within a HFD-induced obesity mouse model in relation to the host-microbial nutritional adaptation.

Challenges in RNA virus bioinformatics.

Abstract:

Computer-assisted studies of structure, function and evolution of viruses remains a neglected area of research. The attention of bioinformaticians to this interesting and challenging field is far from commensurate with its medical and biotechnological importance. It is telling that out of >200 talks held at ISMB 2013, the largest international bioinformatics conference, only one presentation explicitly dealt with viruses. In contrast to many broad, established and well-organized bioinformatics communities (e.g. structural genomics, ontologies, next-generation sequencing, expression analysis), research groups focusing on viruses can probably be counted on the fingers of two hands.
The purpose of this review is to increase awareness among bioinformatics researchers about the pressing needs and unsolved problems of computational virology. We focus primarily on RNA viruses that pose problems to many standard bioinformatics analyses owing to their compact genome organization, fast mutation rate and low evolutionary conservation. We provide an overview of tools and algorithms for handling viral sequencing data, detecting functionally important RNA structures, classifying viral proteins into families and investigating the origin and evolution of viruses.

Abstract:

Listeria monocytogenes, a gram-positive pathogen, and causative agent of listeriosis, has become a widely used model organism for intracellular infections. Recent studies have identified small non-coding RNAs (sRNAs) as important factors for regulating gene expression and pathogenicity of L. monocytogenes. Increased speed and reduced costs of high throughput sequencing (HTS) techniques have made RNA sequencing (RNA-Seq) the state-of-the-art method to study bacterial transcriptomes. We created a large transcriptome dataset of L. monocytogenes containing a total of 21 million reads, using the SOLiD sequencing technology. The dataset contained cDNA sequences generated from L. monocytogenes RNA collected under intracellular and extracellular condition and additionally was size fractioned into three different size ranges from <40 nt, 40-150 nt and >150 nt. We report here, the identification of nine new sRNAs candidates of L. monocytogenes and a reevaluation of known sRNAs of L. monocytogenes EGD-e. Automatic comparison to known sRNAs revealed a high recovery rate of 55%, which was increased to 90% by manual revision of the data. Moreover, thorough classification of known sRNAs shed further light on their possible biological functions. Interestingly among the newly identified sRNA candidates are antisense RNAs (asRNAs) associated to the housekeeping genes purA, fumC and pgi and potentially their regulation, emphasizing the significance of sRNAs for metabolic adaptation in L. monocytogenes.

eggNOG v4.0: nested orthology inference across 3686 organisms.

Abstract:

With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.

Signature protein of the PVC superphylum.

Abstract:

The phyla Planctomycetes, Verrucomicrobia, Chlamydiae, Lentisphaerae, and "Candidatus Omnitrophica (OP3)" comprise bacteria that share an ancestor but show highly diverse biological and ecological features. Together, they constitute the PVC superphylum. Using large-scale comparative genome sequence analysis, we identified a protein uniquely shared among all of the known members of the PVC superphylum. We provide evidence that this signature protein is expressed by representative members of the PVC superphylum. Its predicted structure, physicochemical characteristics, and overexpression in Escherichia coli and gel retardation assays with purified signature protein suggest a housekeeping function with unspecific DNA/RNA binding activity. Phylogenetic analysis demonstrated that the signature protein is a suitable phylogenetic marker for members of the PVC superphylum, and the screening of published metagenome data indicated the existence of additional PVC members. This study provides further evidence of a common evolutionary history of the PVC superphylum and presents a unique case in which a single protein serves as an evolutionary link among otherwise highly diverse members of major bacterial groups.

SIMAP--the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage.

Abstract:

The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith-Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads.

NxrB encoding the beta subunit of nitrite oxidoreductase as functional and phylogenetic marker for nitrite-oxidizing Nitrospira.

Abstract:

Nitrospira are the most widespread and diverse known nitrite-oxidizing bacteria and key nitrifiers in natural and engineered ecosystems. Nevertheless, their ecophysiology and environmental distribution are understudied because of the recalcitrance of Nitrospira to cultivation and the lack of a molecular functional marker, which would allow the detection of Nitrospira in the environment. Here we introduce nxrB, the gene encoding subunit beta of nitrite oxidoreductase, as a functional and phylogenetic marker for Nitrospira. Phylogenetic trees based on nxrB of Nitrospira were largely congruent to 16S ribosomal RNA-based phylogenies. By using new nxrB-selective polymerase chain reaction primers, we obtained almost full-length nxrB sequences from Nitrospira cultures, two activated sludge samples, and several geographically and climatically distinct soils. Amplicon pyrosequencing of nxrB fragments from 16 soils revealed a previously unrecognized diversity of terrestrial Nitrospira with 1801 detected species-level operational taxonomic units (OTUs) (using an inferred species threshold of 95% nxrB identity). Richness estimates ranged from 10 to 946 coexisting Nitrospira species per soil. Comparison with an archaeal amoA dataset obtained from the same soils [Environ. Microbiol. 14: 525-539 (2012)] uncovered that ammonia-oxidizing archaea and Nitrospira communities were highly correlated across the soil samples, possibly indicating shared habitat preferences or specific biological interactions among members of these nitrifier groups.

Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae.

Abstract:

In the era of metagenomics and amplicon sequencing, comprehensive analyses of available sequence data remain a challenge. Here we describe an approach exploiting metagenomic and amplicon data sets from public databases to elucidate phylogenetic diversity of defined microbial taxa. We investigated the phylum Chlamydiae whose known members are obligate intracellular bacteria that represent important pathogens of humans and animals, as well as symbionts of protists. Despite their medical relevance, our knowledge about chlamydial diversity is still scarce. Most of the nine known families are represented by only a few isolates, while previous clone library-based surveys suggested the existence of yet uncharacterized members of this phylum. Here we identified more than 22,000 high quality, non-redundant chlamydial 16S rRNA gene sequences in diverse databases, as well as 1900 putative chlamydial protein-encoding genes. Even when applying the most conservative approach, clustering of chlamydial 16S rRNA gene sequences into operational taxonomic units revealed an unexpectedly high species, genus and family-level diversity within the Chlamydiae, including 181 putative families. These in silico findings were verified experimentally in one Antarctic sample, which contained a high diversity of novel Chlamydiae. In our analysis, the Rhabdochlamydiaceae, whose known members infect arthropods, represents the most diverse and species-rich chlamydial family, followed by the protist-associated Parachlamydiaceae, and a putative new family (PCF8) with unknown host specificity. Available information on the origin of metagenomic samples indicated that marine environments contain the majority of the newly discovered chlamydial lineages, highlighting this environment as an important chlamydial reservoir.

Abstract:

Protein sequence databases are indispensable tools for life science research including mass spectrometry (MS)-based proteomics. In current database construction processes, sequence similarity clustering is used to reduce redundancies in the source data. Albeit powerful, it ignores the peptide-centric nature of proteomic data and the fact that MS is able to distinguish similar sequences. Therefore, we introduce an approach that structures the protein sequence space at the peptide level using theoretical and empirical information from large-scale proteomic data to generate a mass spectrometry-centric protein sequence database (MScDB). The core modules of MScDB are an in-silico proteolytic digest and a peptide-centric clustering algorithm that groups protein sequences that are indistinguishable by mass spectrometry. Analysis of various MScDB uses cases against five complex human proteomes, resulting in 69 peptide identifications not present in UniProtKB as well as 79 putative single amino acid polymorphisms. MScDB retains ~99% of the identifications in comparison to common databases despite a 3-48% increase in the theoretical peptide search space (but comparable protein sequence space). In addition, MScDB enables cross-species applications such as human/mouse graft models, and our results suggest that the uncertainty in protein assignments to one species can be smaller than 20%.

The evolutionary dynamics of protein-protein interaction networks inferred from the reconstruction of ancient networks.

Abstract:

Cellular functions are based on the complex interplay of proteins, therefore the structure and dynamics of these protein-protein interaction (PPI) networks are the key to the functional understanding of cells. In the last years, large-scale PPI networks of several model organisms were investigated. A number of theoretical models have been developed to explain both the network formation and the current structure. Favored are models based on duplication and divergence of genes, as they most closely represent the biological foundation of network evolution. However, studies are often based on simulated instead of empirical data or they cover only single organisms. Methodological improvements now allow the analysis of PPI networks of multiple organisms simultaneously as well as the direct modeling of ancestral networks. This provides the opportunity to challenge existing assumptions on network evolution. We utilized present-day PPI networks from integrated datasets of seven model organisms and developed a theoretical and bioinformatic framework for studying the evolutionary dynamics of PPI networks. A novel filtering approach using percolation analysis was developed to remove low confidence interactions based on topological constraints. We then reconstructed the ancient PPI networks of different ancestors, for which the ancestral proteomes, as well as the ancestral interactions, were inferred. Ancestral proteins were reconstructed using orthologous groups on different evolutionary levels. A stochastic approach, using the duplication-divergence model, was developed for estimating the probabilities of ancient interactions from today's PPI networks. The growth rates for nodes, edges, sizes and modularities of the networks indicate multiplicative growth and are consistent with the results from independent static analysis. Our results support the duplication-divergence model of evolution and indicate fractality and multiplicative growth as general properties of the PPI network structure and dynamics.

Metagenomics of Kamchatkan hot spring filaments reveal two new major (hyper)thermophilic lineages related to Thaumarchaeota.

Abstract:

Based on phylogenetic analyses and gene distribution patterns of a few complete genomes, a new distinct phylum within the Archaea, the Thaumarchaeota, has recently been proposed. Here we present analyses of six archaeal fosmid sequences derived from a microbial hot spring community in Kamchatka. The phylogenetic analysis of informational components (ribosomal RNAs and proteins) reveals two major (hyper-)thermophilic clades ("Hot Thaumarchaeota-related Clade" 1 and 2, HTC1 and HTC2) related to Thaumarchaeota, representing either deep branches of this phylum or a new archaeal phylum and provides information regarding the ancient evolution of Archaea and their evolutionary links with Eukaryotes.

The Genome of Nitrospina gracilis Illuminates the Metabolism and Evolution of the Major Marine Nitrite Oxidizer.

Abstract:

In marine systems, nitrate is the major reservoir of inorganic fixed nitrogen. The only known biological nitrate-forming reaction is nitrite oxidation, but despite its importance, our knowledge of the organisms catalyzing this key process in the marine N-cycle is very limited. The most frequently encountered marine NOB are related to Nitrospina gracilis, an aerobic chemolithoautotrophic bacterium isolated from ocean surface waters. To date, limited physiological and genomic data for this organism were available and its phylogenetic affiliation was uncertain. In this study, the draft genome sequence of N. gracilis strain 3/211 was obtained. Unexpectedly for an aerobic organism, N. gracilis lacks classical reactive oxygen defense mechanisms and uses the reductive tricarboxylic acid cycle for carbon fixation. These features indicate microaerophilic ancestry and are consistent with the presence of Nitrospina in marine oxygen minimum zones. Fixed carbon is stored intracellularly as glycogen, but genes for utilizing external organic carbon sources were not identified. N. gracilis also contains a full gene set for oxidative phosphorylation with oxygen as terminal electron acceptor and for reverse electron transport from nitrite to NADH. A novel variation of complex I may catalyze the required reverse electron flow to low-potential ferredoxin. Interestingly, comparative genomics indicated a strong evolutionary link between Nitrospina, the nitrite-oxidizing genus Nitrospira, and anaerobic ammonium oxidizers, apparently including the horizontal transfer of a periplasmically oriented nitrite oxidoreductase and other key genes for nitrite oxidation at an early evolutionary stage. Further, detailed phylogenetic analyses using concatenated marker genes provided evidence that Nitrospina forms a novel bacterial phylum, for which we propose the name Nitrospinae.

Abstract:

The Amoebozoa constitute one of the primary divisions of eukaryotes, encompassing taxa of both biomedical and evolutionary importance, yet its genomic diversity remains largely unsampled. Here we present an analysis of a whole genome assembly of Acanthamoeba castellanii (Ac) the first representative from a solitary free-living amoebozoan.
Ac encodes 15,455 compact intron-rich genes, a significant number of which are predicted to have arisen through inter-kingdom lateral gene transfer (LGT). A majority of the LGT candidates have undergone a substantial degree of intronization and Ac appears to have incorporated them into established transcriptional programs. Ac manifests a complex signaling and cell communication repertoire, including a complete tyrosine kinase signaling toolkit and a comparable diversity of predicted extracellular receptors to that found in the facultatively multicellular dictyostelids. An important environmental host of a diverse range of bacteria and viruses, Ac utilizes a diverse repertoire of predicted pattern recognition receptors, many with predicted orthologous functions in the innate immune systems of higher organisms.
Our analysis highlights the important role of LGT in the biology of Ac and in the diversification of microbial eukaryotes. The early evolution of a key signaling facility implicated in the evolution of metazoan multicellularity strongly argues for its emergence early in the Unikont lineage. Overall, the availability of an Ac genome should aid in deciphering the biology of the Amoebozoa and facilitate functional genomic studies in this important model organism and environmental host.

Draft genome sequence of Lactobacillus casei W56.

Abstract:

We announce the draft genome sequence of Lactobacillus casei W56 in one contig. This strain shows immunomodulatory and probiotic properties. The strain is also an ingredient of commercially available probiotic products.

Abstract:

The dynamics of reductive genome evolution for eukaryotes living inside other eukaryotic cells are poorly understood compared to well-studied model systems involving obligate intracellular bacteria. Here we present 8.5 Mb of sequence from the genome of the microsporidian Trachipleistophora hominis, isolated from an HIV/AIDS patient, which is an outgroup to the smaller compacted-genome species that primarily inform ideas of evolutionary mode for these enormously successful obligate intracellular parasites. Our data provide detailed information on the gene content, genome architecture and intergenic regions of a larger microsporidian genome, while comparative analyses allowed us to infer genomic features and metabolism of the common ancestor of the species investigated. Gene length reduction and massive loss of metabolic capacity in the common ancestor was accompanied by the evolution of novel microsporidian-specific protein families, whose conservation among microsporidians, against a background of reductive evolution, suggests they may have important functions in their parasitic lifestyle. The ancestor had already lost many metabolic pathways but retained glycolysis and the pentose phosphate pathway to provide cytosolic ATP and reduced coenzymes, and it had a minimal mitochondrion (mitosome) making Fe-S clusters but not ATP. It possessed bacterial-like nucleotide transport proteins as a key innovation for stealing host-generated ATP, the machinery for RNAi, key elements of the early secretory pathway, canonical eukaryotic as well as microsporidian-specific regulatory elements, a diversity of repetitive and transposable elements, and relatively low average gene density. Microsporidian genome evolution thus appears to have proceeded in at least two major steps: an ancestral remodelling of the proteome upon transition to intracellular parasitism that involved reduction but also selective expansion, followed by a secondary compaction of genome architecture in some, but not all, lineages.

Abstract:

Desulfosporosinus species are sulfate-reducing bacteria belonging to the Firmicutes. Their genomes will give insights into the genetic repertoire and evolution of sulfate reducers typically thriving in terrestrial environments and able to degrade toluene (Desulfosporosinus youngiae), to reduce Fe(III) (Desulfosporosinus meridiei, Desulfosporosinus orientis), and to grow under acidic conditions (Desulfosporosinus acidiphilus).

Abstract:

The cohort of the ammonia-oxidizing archaea (AOA) of the phylum Thaumarchaeota is a diverse, widespread and functionally important group of microorganisms in many ecosystems. However, our understanding of their biology is still very rudimentary in part because all available genome sequences of this phylum are from members of the Nitrosopumilus cluster. Here we report on the complete genome sequence of Candidatus Nitrososphaera gargensis obtained from an enrichment culture, representing a different evolutionary lineage of AOA frequently found in high numbers in many terrestrial environments. With its 2.83 Mb the genome is much larger than that of other AOA. The presence of a high number of (active) IS elements/transposases, genomic islands, gene duplications and a complete CRISPR/Cas defence system testifies to its dynamic evolution consistent with low degree of synteny with other thaumarchaeal genomes. As expected, the repertoire of conserved enzymes proposed to be required for archaeal ammonia oxidation is encoded by N. gargensis, but it can also use urea and possibly cyanate as alternative ammonia sources. Furthermore, its carbon metabolism is more flexible at the central pyruvate switch point, encompasses the ability to take up small organic compounds and might even include an oxidative pentose phosphate pathway. Furthermore, we show that thaumarchaeota produce cofactor F420 as well as polyhydroxyalkanoates. Lateral gene transfer from bacteria and euryarchaeota has contributed to the metabolic versatility of N. gargensis. This organisms is well adapted to its niche in a heavy metal-containing thermal spring by encoding a multitude of heavy metal resistance genes, chaperones and mannosylglycerate as compatible solute and has the genetic ability to respond to environmental changes by signal transduction via a large number of two-component systems, by chemotaxis and flagella-mediated motility and possibly even by gas vacuole formation. These findings extend our understanding of thaumarchaeal evolution and physiology and offer many testable hypotheses for future experimental research on these nitrifiers.

Abstract:

In this study the phenotypic and transcriptomic traits associated with the alternative sigma factor protein Sigma L in Listeria monocytogenes EGD-e were investigated. It was demonstrated that Sigma L is required for efficient growth in presence of stress associated with food preservative measures such as low temperature and organic acids. Furthermore, besides attenuation of swarming motility, the disruption of Sigma L in this bacterium also reduces resistance to a diverse range of toxic compounds, including some of the antibiotics used in listeriosis treatment. Genes under Sigma L-dependent transcriptional regulation were identified based on comparison of transcriptomes between exponentially growing cells of the EGD-e sigL null mutant and its parental strain cultivated under cold stress (3 °C) and optimized (37 °C) temperature conditions. Four hundred and forty genes under positive Sigma L-dependent transcriptional regulation were identified. The Sigma L regulon as revealed under these conditions comprises genes that code for proteins with diverse cellular functions including protein synthesis, nutrient transport, energy metabolism, cell envelope synthesis, and motility. The diverse range of transcriptome alterations induced by a sigL null mutation is thus consistent with the multiple phenotypic defects observed in the EGD-e ΔsigL mutant. These results demonstrate that Sigma L provides important global transcription regulatory functions in L. monocytogenes EGD-e. These promote execution of various cellular processes and stress adaptation responses thereby enabling this bacterium to overcome various food preservation measures as well as antibiotics and other toxic chemicals.

Phage morphology recapitulates phylogeny: the comparative genomics of a new group of myoviruses.

Abstract:

Among dsDNA tailed bacteriophages (Caudovirales), members of the Myoviridae family have the most sophisticated virion design that includes a complex contractile tail structure. The Myoviridae generally have larger genomes than the other phage families. Relatively few "dwarf" myoviruses, those with a genome size of less than 50 kb such as those of the Mu group, have been analyzed in extenso. Here we report on the genome sequencing and morphological characterization of a new group of such phages that infect a diverse range of Proteobacteria, namely Aeromonas salmonicida phage 56, Vibrio cholerae phages 138 and CP-T1, Bdellovibrio phage φ1422, and Pectobacterium carotovorum phage ZF40. This group of dwarf myoviruses shares an identical virion morphology, characterized by usually short contractile tails, and have genome sizes of approximately 45 kb. Although their genome sequences are variable in their lysogeny, replication, and host adaption modules, presumably reflecting differing lifestyles and hosts, their structural and morphogenesis modules have been evolutionarily constrained by their virion morphology. Comparative genomic analysis reveals that these phages, along with related prophage genomes, form a new coherent group within the Myoviridae. The results presented in this communication support the hypothesis that the diversity of phages may be more structured than generally believed and that the innumerable phages in the biosphere all belong to discrete lineages or families.

Effects of season and experimental warming on the bacterial community in a temperate mountain forest soil assessed by 16S rRNA gene pyrosequencing.

Abstract:

Climate warming may induce shifts in soil microbial communities possibly altering the long-term carbon mineralization potential of soils. We assessed the response of the bacterial community in a forest soil to experimental soil warming (+4 °C) in the context of seasonal fluctuations. Three experimental plots were sampled in the fourth year of warming in summer and winter and compared to control plots by 16S rRNA gene pyrosequencing. We sequenced 17,308 amplicons per sample and analysed operational taxonomic units at genetic distances of 0.03, 0.10 and 0.25, with respective Good's coverages of 0.900, 0.977 and 0.998. Diversity indices did not differ between summer, winter, control or warmed samples. Summer and winter samples differed in community structure at a genetic distance of 0.25, corresponding approximately to phylum level. This was mainly because of an increase of Actinobacteria in winter. Abundance patterns of dominant taxa (> 0.06% of all reads) were analysed individually and revealed, that seasonal shifts were coherent among related phylogenetic groups. Seasonal community dynamics were subtle compared to the dynamics of soil respiration. Despite a pronounced respiration response to soil warming, we did not detect warming effects on community structure or composition. Fine-scale shifts may have been concealed by the considerable spatial variation.

Comparative analysis of benzoxazinoid biosynthesis in monocots and dicots: independent recruitment of stabilization and activation functions.

Abstract:

Benzoxazinoids represent preformed protective and allelophatic compounds that are found in a multitude of species of the family Poaceae (Gramineae) and occur sporadically in single species of phylogenetically unrelated dicots. Stabilization by glucosylation and activation by hydrolysis is essential for the function of these plant defense compounds. We isolated and functionally characterized from the dicot larkspur (Consolida orientalis) the benzoxazinoid-specific UDP-glucosyltransferase and β-glucosidase that catalyze the enzymatic functions required to avoid autotoxicity and allow activation upon challenge by herbivore and pathogen attack. A phylogenetic comparison of these enzymes with their counterparts in the grasses indicates convergent evolution by repeated recruitment from homologous but not orthologous genes. The data reveal a great evolutionary flexibility in recruitment of these essential functions of secondary plant metabolism.

Metatranscriptomics of the marine sponge Geodia barretti: tackling phylogeny and function of its microbial community.

Abstract:

Geodia barretti is a marine cold-water sponge harbouring high numbers of microorganisms. Significant rates of nitrification have been observed in this sponge, indicating a substantial contribution to nitrogen turnover in marine environments with high sponge cover. In order to get closer insights into the phylogeny and function of the active microbial community and the interaction with its host G. barretti, a metatranscriptomic approach was employed, using the simultaneous analysis of rRNA and mRNA. Of the 262 298 RNA-tags obtained by pyrosequencing, 92% were assigned to ribosomal RNA (ribo-tags). A total of 109 325 SSU rRNA ribo-tags revealed a detailed picture of the community, dominated by group SAR202 of Chloroflexi, candidate phylum Poribacteria and Acidobacteria, which was different in its composition from that obtained in clone libraries prepared form the same samples. Optimized assembly strategies allowed the reconstruction of full-length rRNA sequences from the short ribo-tags for more detailed phylogenetic studies of the dominant taxa. Cells of several phyla were visualized by FISH analyses for confirmation. Of the remaining 21 325 RNA-tags, 10 023 were assigned to mRNA-tags, based on similarities to genes in the databases. A wide range of putative functional gene transcripts from over 10 different phyla were identified among the bacterial mRNA-tags. The most abundant mRNAs were those encoding key metabolic enzymes of nitrification from ammonia-oxidizing archaea as well as candidate genes involved in related processes. Our analysis demonstrates the potential and limits of using a combined rRNA and mRNA approach to explore the microbial community profile, phylogenetic assignments and metabolic activities of a complex, but little explored microbial community.

amoA-based consensus phylogeny of ammonia-oxidizing archaea and deep sequencing of amoA genes from soils of four different geographic regions.

Abstract:

Ammonia-oxidizing archaea (AOA) play an important role in nitrification and many studies exploit their amoA genes as marker for their diversity and abundance. We present an archaeal amoA consensus phylogeny based on all publicly available sequences (status June 2010) and provide evidence for the diversification of AOA into four previously recognized clusters and one newly identified major cluster. These clusters, for which we suggest a new nomenclature, harboured 83 AOA species-level OTU (using an inferred species threshold of 85% amoA identity). 454 pyrosequencing of amoA amplicons from 16 soils sampled in Austria, Costa Rica, Greenland and Namibia revealed that only 2% of retrieved sequences had no database representative on the species-level and represented 30-37 additional species-level OTUs. With the exception of an acidic soil from which mostly amoA amplicons of the Nitrosotalea cluster were retrieved, all soils were dominated by amoA amplicons from the Nitrososphaera cluster (also called group I.1b), indicating that the previously reported AOA from the Nitrosopumilus cluster (also called group I.1a) are absent or represent minor populations in soils. AOA richness estimates on the species level ranged from 8-83 co-existing AOAs per soil. Presence/absence of amoA OTUs (97% identity level) correlated with geographic location, indicating that besides contemporary environmental conditions also dispersal limitation across different continents and/or historical environmental conditions might influence AOA biogeography in soils.

Abstract:

Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721,801 orthologous groups, encompassing a total of 4,396,591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101,208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450,904 orthologous groups (62.5%).

Abstract:

Adelgids (Insecta: Hemiptera: Adelgidae) are known as severe pests of various conifers in North America, Canada, Europe and Asia. Here, we present the first molecular identification of bacteriocyte-associated symbionts in these plant sap-sucking insects. Three geographically distant populations of members of the Adelges nordmannianae/piceae complex, identified based on coI and ef1alpha gene sequences, were investigated. Electron and light microscopy revealed two morphologically different endosymbionts, coccoid or polymorphic, which are located in distinct bacteriocytes. Phylogenetic analyses of their 16S and 23S rRNA gene sequences assigned both symbionts to novel lineages within the Gammaproteobacteria sharing <92% 16S rRNA sequence similarity with each other and showing no close relationship with known symbionts of insects. Their identity and intracellular location were confirmed by fluorescence in situ hybridization, and the names 'Candidatus Steffania adelgidicola' and 'Candidatus Ecksteinia adelgidicola' are proposed for tentative classification. Both symbionts were present in all individuals of all investigated populations and in different adelgid life stages including eggs, suggesting vertical transmission from mother to offspring. An 85 kb genome fragment of 'Candidatus S. adelgidicola' was reconstructed based on a metagenomic library created from purified symbionts. Genomic features including the frequency of pseudogenes, the average length of intergenic regions and the presence of several genes which are absent in other long-term obligate symbionts, suggested that 'Candidatus S. adelgidicola' is an evolutionarily young bacteriocyte-associated symbiont, which has been acquired after diversification of adelgids from their aphid sister group.

Unity in variety--the pan-genome of the Chlamydiae.

Abstract:

Chlamydiae are evolutionarily well-separated bacteria that live exclusively within eukaryotic host cells. They include important human pathogens such as Chlamydia trachomatis as well as symbionts of protozoa. As these bacteria are experimentally challenging and genetically intractable, our knowledge about them is still limited. In this study, we obtained the genome sequences of Simkania negevensis Z, Waddlia chondrophila 2032/99, and Parachlamydia acanthamoebae UV-7. This enabled us to perform the first comprehensive comparative and phylogenomic analysis of representative members of four major families of the Chlamydiae, including the Chlamydiaceae. We identified a surprisingly large core gene set present in all genomes and a high number of diverse accessory genes in those Chlamydiae that do not primarily infect humans or animals, including a chemosensory system in P. acanthamoebae and a type IV secretion system. In S. negevensis, the type IV secretion system is encoded on a large conjugative plasmid (pSn, 132 kb). Phylogenetic analyses suggested that a plasmid similar to the S. negevensis plasmid was originally acquired by the last common ancestor of all four families and that it was subsequently reduced, integrated into the chromosome, or lost during diversification, ultimately giving rise to the extant virulence-associated plasmid of pathogenic chlamydiae. Other virulence factors, including a type III secretion system, are conserved among the Chlamydiae to variable degrees and together with differences in the composition of the cell wall reflect adaptation to different host cells including convergent evolution among the four chlamydial families. Phylogenomic analysis focusing on chlamydial proteins with homology to plant proteins provided evidence for the acquisition of 53 chlamydial genes by a plant progenitor, lending further support for the hypothesis of an early interaction between a chlamydial ancestor and the primary photosynthetic eukaryote.

Abstract:

Yersinia enterocolitica strains responsible for mild gastroenteritis in humans are very diverse with respect to their metabolic and virulence properties. Strain W22703 (biotype 2, serotype O:9) was recently identified to possess nematocidal and insecticidal activity. To better understand the relationship between pathogenicity towards insects and humans, we compared the W22703 genome with that of the highly pathogenic strain 8081 (biotype1B; serotype O:8), the only Y. enterocolitica strain sequenced so far.
We used whole-genome shotgun data to assemble, annotate and analyse the sequence of strain W22703. Numerous factors assumed to contribute to enteric survival and pathogenesis, among them osmoregulated periplasmic glucan, hydrogenases, cobalamin-dependent pathways, iron uptake systems and the Yersinia genome island 1 (YGI-1) involved in tight adherence were identified to be common to the 8081 and W22703 genomes. However, sets of ~550 genes revealed to be specific for each of them in comparison to the other strain. The plasticity zone (PZ) of 142 kb in the W22703 genome carries an ancient flagellar cluster Flg-2 of ~40 kb, but it lacks the pathogenicity island YAPI(Ye), the secretion system ysa and yts1, and other virulence determinants of the 8081 PZ. Its composition underlines the prominent variability of this genome region and demonstrates its contribution to the higher pathogenicity of biotype 1B strains with respect to W22703. A novel type three secretion system of mosaic structure was found in the genome of W22703 that is absent in the sequenced strains of the human pathogenic Yersinia species, but conserved in the genomes of the apathogenic species. We identified several regions of differences in W22703 that mainly code for transporters, regulators, metabolic pathways, and defence factors.
The W22703 sequence analysis revealed a genome composition distinct from other pathogenic Yersinia enterocolitica strains, thus contributing novel data to the Y. enterocolitica pan-genome. This study also sheds further light on the strategies of this pathogen to cope with its environments.

B2G-FAR, a species-centered GO annotation repository.

Abstract:

Functional genomics research has expanded enormously in the last decade thanks to the cost reduction in high-throughput technologies and the development of computational tools that generate, standardize and share information on gene and protein function such as the Gene Ontology (GO). Nevertheless, many biologists, especially working with non-model organisms, still suffer from non-existing or low-coverage functional annotation, or simply struggle retrieving, summarizing and querying these data.
The Blast2GO Functional Annotation Repository (B2G-FAR) is a bioinformatics resource envisaged to provide functional information for otherwise uncharacterized sequence data and offers data mining tools to analyze a larger repertoire of species than currently available. This new annotation resource has been created by applying the Blast2GO functional annotation engine in a strongly high-throughput manner to the entire space of public available sequences. The resulting repository contains GO term predictions for over 13.2 million non-redundant protein sequences based on BLAST search alignments from the SIMAP database. We generated GO annotation for approximately 150 000 different taxa making available 2000 species with the highest coverage through B2G-FAR. A second section within B2G-FAR holds functional annotations for 17 non-model organism Affymetrix GeneChips.
B2G-FAR provides easy access to exhaustive functional annotation for 2000 species offering a good balance between quality and quantity, thereby supporting functional genomics research especially in the case of non-model organisms.
The annotation resource is available at http://www.b2gfar.org.

Abstract:

Anaerobic degradation of polycyclic aromatic hydrocarbons (PAHs) is an important process during natural attenuation of aromatic hydrocarbon spills. However, knowledge about metabolic potential and physiology of organisms involved in anaerobic degradation of PAHs is scarce. Therefore, we introduce the first genome of the sulfate-reducing Deltaproteobacterium N47 able to catabolize naphthalene, 2-methylnaphthalene, or 2-naphthoic acid as sole carbon source. Based on proteomics, we analysed metabolic pathways during growth on PAHs to gain physiological insights on anaerobic PAH degradation. The genomic assembly and taxonomic binning resulted in 17 contigs covering most of the sulfate reducer N47 genome according to general cluster of orthologous groups (COGs) analyses. According to the genes present, the Deltaproteobacterium N47 can potentially grow with the following sugars including d-mannose, d-fructose, d-galactose, α-d-glucose-1P, starch, glycogen, peptidoglycan and possesses the prerequisites for butanoic acid fermentation. Despite the inability for culture N47 to utilize NO(3) (-) as terminal electron acceptor, genes for nitrate ammonification are present. Furthermore, it is the first sequenced genome containing a complete TCA cycle along with the carbon monoxide dehydrogenase pathway. The genome contained a significant percentage of repetitive sequences and transposase-related protein domains enhancing the ability of genome evolution. Likewise, the sulfate reducer N47 genome contained many unique putative genes with unknown function, which are candidates for yet-unknown metabolic pathways.

Functional analysis of the finO distal region of plasmid R1.

Abstract:

The intergenic region linking conjugative transfer and replication copy control modules of IncF plasmids shows conservation of gene homology and organization. Genes distal to finO are coordinately expressed with the upstream transfer operon encoding the majority of conjugation genes in related plasmids. Here we investigate potential functions for these genes in copy number control and in processes related to conjugation: gene transfer, pilus specific phage infection and plasmid-promoted biofilm formation by an Escherichia coli host. We find that insertional inactivation of genes in the finO distal region reduced transcriptional read through into the downstream copB gene of plasmid R1. The mutant plasmid derivatives exhibited a reduced copy number compared to the wild type. Moreover all insertion mutant derivatives of plasmid R1-16 with aberrantly low copy numbers conferred poor biofilm forming ability to their hosts. The general mutagenesis thus identified plasmid stability genes as the only plasmid functions besides conjugation genes linked to plasmid-promoted biofilm production under these laboratory conditions. Our findings imply that a novel component of cis- or trans-regulation on the transcriptional level is important to normal R1 plasmid copy number regulation.

Abstract:

The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).

Effective--a database of predicted secreted bacterial proteins.

Abstract:

Protein secretion is a key virulence mechanism of pathogenic and symbiotic bacteria, which makes the investigation of secreted proteins ('effectors') crucial for understanding the molecular bacterium-host interactions. Effective (http://effectors.org) is a database of predicted bacterial secreted proteins, implementing two complementary prediction strategies for protein secretion: the identification of eukaryotic-like protein domains and the recognition of signal peptides in amino acid sequences. The Effective web portal provides user-friendly tools for browsing and retrieving comprehensive precalculated predictions for whole bacterial genomes as well as for the interactive prediction of effectors in user-provided protein sequences.

Abstract:

Here, we report the complete and annotated genome sequence of Cronobacter turicensis, an opportunistic food-borne pathogen, which is known as a rare but important cause of life-threatening neonatal infections. Among all proteins of C. turicensis, 223 have been annotated as virulence- and disease-related proteins.

Abstract:

Marine sponges contain complex bacterial communities of considerable ecological and biotechnological importance, with many of these organisms postulated to be specific to sponge hosts. Testing this hypothesis in light of the recent discovery of the rare microbial biosphere, we investigated three Australian sponges by massively parallel 16S rRNA gene tag pyrosequencing. Here we show bacterial diversity that is unparalleled in an invertebrate host, with more than 250,000 sponge-derived sequence tags being assigned to 23 bacterial phyla and revealing up to 2996 operational taxonomic units (95% sequence similarity) per sponge species. Of the 33 previously described 'sponge-specific' clusters that were detected in this study, 48% were found exclusively in adults and larvae - implying vertical transmission of these groups. The remaining taxa, including 'Poribacteria', were also found at very low abundance among the 135,000 tags retrieved from surrounding seawater. Thus, members of the rare seawater biosphere may serve as seed organisms for widely occurring symbiont populations in sponges and their host association might have evolved much more recently than previously thought.

Abstract:

In preparation for transfer conjugative type IV secretion systems (T4SS) produce a nucleoprotein adduct containing a relaxase enzyme covalently linked to the 5' end of single-stranded plasmid DNA. The bound relaxase is expected to present features necessary for selective recognition by the type IV coupling protein (T4CP), which controls substrate entry to the envelope spanning secretion machinery. We prove that the IncF plasmid R1 relaxase TraI is translocated to the recipient cells. Using a Cre recombinase assay (CRAfT) we mapped two internally positioned translocation signals (TS) on F-like TraI proteins that independently mediate efficient recognition and secretion. Tertiary structure predictions for the TS matched best helicase RecD2 from Deinococcus radiodurans. The TS is widely conserved in MOB(F) and MOB(Q) families of relaxases. Structure/function relationships within the TS were identified by mutation. A key residue in specific recognition by T4CP TraD was revealed by a fidelity switch phenotype for an F to plasmid R1 exchange L626H mutation. Finally, we show that physical linkage of the relaxase catalytic domain to a TraI TS is necessary for efficient conjugative transfer.

Impact of natural genetic variation on the transcriptome of autotetraploid Arabidopsis thaliana.

Abstract:

Polyploidy, the presence of more than two complete sets of chromosomes in an organism, has significantly shaped the genomes of angiosperms during evolution. Two forms of polyploidy are often considered: allopolyploidy, which originates from interspecies hybrids, and autopolyploidy, which originates from intraspecies genome duplication events. Besides affecting genome organization, polyploidy generates other genetic effects. Synthetic allopolyploid plants exhibit considerable transcriptome alterations, part of which are likely caused by the reunion of previously diverged regulatory hierarchies. In contrast, autopolyploids have relatively uniform genomes, suggesting lower alteration of gene expression. To evaluate the impact of intraspecies genome duplication on the transcriptome, we generated a series of unique Arabidopsis thaliana autotetraploids by using different ecotypes. A. thaliana autotetraploids show transcriptome alterations that strongly depend on their parental genome composition and include changed expression of both new genes and gene groups previously described from allopolyploid Arabidopsis. Alterations in gene expression are stable, nonstochastic, developmentally specific, and associated with changes in DNA methylation. We propose that Arabidopsis possesses an inherent and heritable ability to sense and respond to elevated, yet balanced chromosome numbers. The impact of natural variation on alteration of autotetraploid gene expression stresses its potential importance in the evolution and breeding of plants.

A Nitrospira metagenome illuminates the physiology and evolution of globally important nitrite-oxidizing bacteria.

Abstract:

Nitrospira are barely studied and mostly uncultured nitrite-oxidizing bacteria, which are, according to molecular data, among the most diverse and widespread nitrifiers in natural ecosystems and biological wastewater treatment. Here, environmental genomics was used to reconstruct the complete genome of "Candidatus Nitrospira defluvii" from an activated sludge enrichment culture. On the basis of this first-deciphered Nitrospira genome and of experimental data, we show that Ca. N. defluvii differs dramatically from other known nitrite oxidizers in the key enzyme nitrite oxidoreductase (NXR), in the composition of the respiratory chain, and in the pathway used for autotrophic carbon fixation, suggesting multiple independent evolution of chemolithoautotrophic nitrite oxidation. Adaptations of Ca. N. defluvii to substrate-limited conditions include an unusual periplasmic NXR, which is constitutively expressed, and pathways for the transport, oxidation, and assimilation of simple organic compounds that allow a mixotrophic lifestyle. The reverse tricarboxylic acid cycle as the pathway for CO2 fixation and the lack of most classical defense mechanisms against oxidative stress suggest that Nitrospira evolved from microaerophilic or even anaerobic ancestors. Unexpectedly, comparative genomic analyses indicate functionally significant lateral gene-transfer events between the genus Nitrospira and anaerobic ammonium-oxidizing planctomycetes, which share highly similar forms of NXR and other proteins reflecting that two key processes of the nitrogen cycle are evolutionarily connected.

Distinct gene set in two different lineages of ammonia-oxidizing archaea supports the phylum Thaumarchaeota.

Abstract:

Globally distributed archaea comprising ammonia oxidizers of moderate terrestrial and marine environments are considered the most abundant archaeal organisms on Earth. Based on 16S rRNA phylogeny, initial assignment of these archaea was to the Crenarchaeota. By contrast, features of the first genome sequence from a member of this group suggested that they belong to a novel phylum, the Thaumarchaeota. Here, we re-investigate the Thaumarchaeota hypothesis by including two newly available genomes, that of the marine ammonia oxidizer Nitrosopumilus maritimus and that of Nitrososphaera gargensis, a representative of another evolutionary lineage within this group predominantly detected in terrestrial environments. Phylogenetic studies based on r-proteins and other core genes, as well as comparative genomics, confirm the assignment of these organisms to a separate phylum and reveal a Thaumarchaeota-specific set of core informational processing genes, as well as potentially ancestral features of the archaea.

Abstract:

Anaerobic benzene degradation was studied with a highly enriched iron-reducing culture (BF) composed of mainly Peptococcaceae-related Gram-positive microorganisms. The proteomes of benzene-, phenol- and benzoate-grown cells of culture BF were compared by SDS-PAGE. A specific benzene-expressed protein band of 60 kDa, which could not be observed during growth on phenol or benzoate, was subjected to N-terminal sequence analysis. The first 31 amino acids revealed that the protein was encoded by ORF 138 in the shotgun sequenced metagenome of culture BF. ORF 138 showed 43% sequence identity to phenylphosphate carboxylase subunit PpcA of Aromatoleum aromaticum strain EbN1. A LC/ESI-MS/MS-based shotgun proteomic analysis revealed other specifically benzene-expressed proteins with encoding genes located adjacent to ORF 138 on the metagenome. The protein products of ORF 137, ORF 139 and ORF 140 showed sequence identities of 37% to phenylphosphate carboxylase PpcD of A. aromaticum strain EbN1, 56% to benzoate-CoA ligase (BamY) of Geobacter metallireducens and 67% to 3-octaprenyl-4-hydroxybenzoate carboxy-lyase (UbiD/UbiX) of A. aromaticum strain EbN1 respectively. These genes are proposed as constituents of a putative benzene degradation gene cluster (∼ 17 kb) composed of carboxylase-related genes. The identified gene sequences suggest that the initial activation reaction in anaerobic benzene degradation is probably a direct carboxylation of benzene to benzoate catalysed by putative anaerobic benzene carboxylase (Abc). The putative Abc probably consists of several subunits, two of which are encoded by ORFs 137 and 138, and belongs to a family of carboxylases including phenylphosphate carboxylase (Ppc) and 3-octaprenyl-4-hydroxybenzoate carboxy-lyase (UbiD/UbiX).

Independent evolution of the core domain and its flanking sequences in small heat shock proteins.

Abstract:

Small heat shock proteins (sHsps) are molecular chaperones involved in maintaining protein homeostasis; they have also been implicated in protein folding diseases and in cancer. In this protein family, a conserved core domain, the so-called α-crystallin or Hsp20 domain, is flanked by highly variable, nonconserved sequences that are essential for chaperone function. Analysis of 8714 sHsps revealed a broad variation of primary sequences within the superfamily as well as phyla-dependent differences. Significant variations were found in the number of sHsps per genome, their amino acid composition, and the length distribution of the different sequence parts. Reconstruction of the evolutionary tree for the sHsp superfamily shows that the flanking regions fall into several subgroups, indicating that they were remodeled several times in parallel but independent of the evolution of the α-crystallin domain. The evolutionary history of sHsps is thus set apart from that of other protein families in that two exon boundary-independent strategies are combined: the evolution of the conserved α-crystallin domain and the independent evolution of the N- and C-terminal sequences. This scenario allows for increased variability in specific small parts of the protein and thus promotes functional and structural differentiation of sHsps, which is not reflected in the general evolutionary tree of species.

Abstract:

The freshwater cnidarian Hydra was first described in 1702 and has been the object of study for 300 years. Experimental studies of Hydra between 1736 and 1744 culminated in the discovery of asexual reproduction of an animal by budding, the first description of regeneration in an animal, and successful transplantation of tissue between animals. Today, Hydra is an important model for studies of axial patterning, stem cell biology and regeneration. Here we report the genome of Hydra magnipapillata and compare it to the genomes of the anthozoan Nematostella vectensis and other animals. The Hydra genome has been shaped by bursts of transposable element expansion, horizontal gene transfer, trans-splicing, and simplification of gene structure and gene content that parallel simplification of the Hydra life cycle. We also report the sequence of the genome of a novel bacterium stably associated with H. magnipapillata. Comparisons of the Hydra genome to the genomes of other animals shed light on the evolution of epithelia, contractile tissues, developmentally regulated transcription factors, the Spemann-Mangold organizer, pluripotency genes and the neuromuscular junction.

Abstract:

The Type III secretion system (TTSS) facilitates the export of effector proteins from pathogenic and symbiotic Gram-negative bacteria into the cytosol of eukaryotic host cells. The current functional and evolutionary knowledge on the molecular recognition of TTSS substrates and computational models of the secretion signal are discussed in this review.

Abstract:

Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

The genome of the amoeba symbiont "Candidatus Amoebophilus asiaticus" reveals common mechanisms for host cell interaction among amoeba-associated bacteria.

Abstract:

Protozoa play host for many intracellular bacteria and are important for the adaptation of pathogenic bacteria to eukaryotic cells. We analyzed the genome sequence of "Candidatus Amoebophilus asiaticus," an obligate intracellular amoeba symbiont belonging to the Bacteroidetes. The genome has a size of 1.89 Mbp, encodes 1,557 proteins, and shows massive proliferation of IS elements (24% of all genes), although the genome seems to be evolutionarily relatively stable. The genome does not encode pathways for de novo biosynthesis of cofactors, nucleotides, and almost all amino acids. "Ca. Amoebophilus asiaticus" encodes a variety of proteins with predicted importance for host cell interaction; in particular, an arsenal of proteins with eukaryotic domains, including ankyrin-, TPR/SEL1-, and leucine-rich repeats, which is hitherto unmatched among prokaryotes, is remarkable. Unexpectedly, 26 proteins that can interfere with the host ubiquitin system were identified in the genome. These proteins include F- and U-box domain proteins and two ubiquitin-specific proteases of the CA clan C19 family, representing the first prokaryotic members of this protein family. Consequently, interference with the host ubiquitin system is an important host cell interaction mechanism of "Ca. Amoebophilus asiaticus". More generally, we show that the eukaryotic domains identified in "Ca. Amoebophilus asiaticus" are also significantly enriched in the genomes of other amoeba-associated bacteria (including chlamydiae, Legionella pneumophila, Rickettsia bellii, Francisella tularensis, and Mycobacterium avium). This indicates that phylogenetically and ecologically diverse bacteria which thrive inside amoebae exploit common mechanisms for interaction with their hosts, and it provides further evidence for the role of amoebae as training grounds for bacterial pathogens of humans.

The Negatome database: a reference set of non-interacting protein pairs.

Abstract:

The Negatome is a collection of protein and domain pairs that are unlikely to be engaged in direct physical interactions. The database currently contains experimentally supported non-interacting protein pairs derived from two distinct sources: by manual curation of literature and by analyzing protein complexes with known 3D structure. More stringent lists of non-interacting pairs were derived from these two datasets by excluding interactions detected by high-throughput approaches. Additionally, non-interacting protein domains have been derived from the stringent manual and structural data, respectively. The Negatome is much less biased toward functionally dissimilar proteins than the negative data derived by randomly selecting proteins from different cellular locations. It can be used to evaluate protein and domain interactions from new experiments and improve the training of interaction prediction algorithms. The Negatome database is available at http://mips.helmholtz-muenchen.de/proj/ppi/negatome.

Abstract:

The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

Abstract:

The highly enriched deltaproteobacterial culture N47 anaerobically oxidizes the polycyclic aromatic hydrocarbons naphthalene and 2-methylnaphthalene, with sulfate as the electron acceptor. Combined genome sequencing and liquid chromatography-tandem mass spectrometry-based shotgun proteome analyses were performed to identify genes and proteins involved in anaerobic aromatic catabolism. Proteome analysis of 2-methylnaphthalene-grown N47 cells resulted in the identification of putative enzymes catalyzing the anaerobic conversion of 2-methylnaphthalene to 2-naphthoyl coenzyme A (2-naphthoyl-CoA), as well as the reductive ring cleavage of 2-naphthoyl-CoA, leading to the formation of acetyl-CoA and CO(2). The glycyl radical-catalyzed fumarate addition to the methyl group of 2-methylnaphthalene is catalyzed by naphthyl-2-methyl-succinate synthase (Nms), composed of alpha-, beta-, and gamma-subunits that are encoded by the genes nmsABC. Located upstream of nmsABC is nmsD, encoding the Nms-activating enzyme, which harbors the characteristic [Fe(4)S(4)] cluster sequence motifs of S-adenosylmethionine radical enzymes. The bns gene cluster, coding for enzymes involved in beta-oxidation reactions converting naphthyl-2-methyl-succinate to 2-naphthoyl-CoA, was found four intervening open reading frames further downstream. This cluster consists of eight genes (bnsABCDEFGH) corresponding to 8.1 kb, which are closely related to genes for enzymes involved in anaerobic toluene degradation within the denitrifiers "Aromatoleum aromaticum" EbN1, Azoarcus sp. strain T, and Thauera aromatica. Another contiguous DNA sequence harbors the gene for 2-naphthoyl-CoA reductase (ncr) and 16 additional genes that were found to be expressed in 2-methylnaphthalene-grown cells. These genes code for enzymes that were supposed to catalyze the dearomatization and ring cleavage reactions converting 2-naphthoyl-CoA to acetyl-CoA and CO(2). Comparative sequence analysis of the four encoding subunits (ncrABCD) showed the gene product to have the closest similarity to the Azoarcus type of benzoyl-CoA reductase. The present work provides the first insight into the genetic basis of anaerobic 2-methylnaphthalene metabolism and delivers implications for understanding contaminant degradation.