BMC Bioinformatics - Latest Articleshttp://www.biomedcentral.com/bmcbioinformatics/
The latest research articles published by BMC Bioinformatics2015-07-31T12:00:00Z

Simulated unbound structures for benchmarking of protein docking in the <smcaps>Dockground </smcaps>resource Background:
Proteins play an important role in biological processes in living organisms. Many protein functions are based on interaction with other proteins. The structural information is important for adequate description of these interactions. Sets of protein structures determined in both bound and unbound states are essential for benchmarking of the docking procedures. However, the number of such proteins in PDB is relatively small. A radical expansion of such sets is possible if the unbound structures are computationally simulated.
Results:
The Dockground public resource provides data to improve our understanding of protein–protein interactions and to assist in the development of better tools for structural modeling of protein complexes, such as docking algorithms and scoring functions. A large set of simulated unbound protein structures was generated from the bound structures. The modeling protocol was based on 1 ns Langevin dynamics simulation. The simulated structures were validated on the ensemble of experimentally determined unbound and bound structures. The set is intended for large scale benchmarking of docking algorithms and scoring functions.
Conclusions:
A radical expansion of the unbound protein docking benchmark set was achieved by simulating the unbound structures. The simulated unbound structures were selected according to criteria from systematic comparison of experimentally determined bound and unbound structures. The set is publicly available at http://dockground.compbio.ku.edu.http://www.biomedcentral.com/1471-2105/16/243
Tatsiana KirysAnatoly RuvinskyDeepak SinglaAlexander TuzikovPetras KundrotasIlya VakserBMC Bioinformatics 2015, null:2432015-07-31T12:00:00Zdoi:10.1186/s12859-015-0672-3/content/figures/s12859-015-0672-3-toc.gifBMC Bioinformatics1471-2105${item.volume}2432015-07-31T12:00:00ZXML HaploPOP: a software that improves population assignment by combining markers into haplotypes Background:
In ecology and forensics, some population assignment techniques use molecular markers to assign individuals to known groups. However, assigning individuals to known populations can be difficult if the level of genetic differentiation among populations is small. Most assignment studies handle independent markers, often by pruning markers in Linkage Disequilibrium (LD), ignoring the information contained in the correlation among markers due to LD.
Results:
To improve the accuracy of population assignment, we present an algorithm, implemented in the HaploPOP software, that combines markers into haplotypes, without requiring independence. The algorithm is based on the Gain of Informativeness for Assignment that provides a measure to decide if a pair of markers should be combined into haplotypes, or not, in order to improve assignment. Because complete exploration of all possible solutions for constructing haplotypes is computationally prohibitive, our approach uses a greedy algorithm based on windows of fixed sizes. We evaluate the performance of HaploPOP to assign individuals to populations using a split-validation approach. We investigate both simulated SNPs data and dense genotype data from individuals from Spain and Portugal.
Conclusions:
Our results show that constructing haplotypes with HaploPOP can substantially reduce assignment error. The HaploPOP software is freely available as a command-line software at www.ieg.uu.se/Jakobsson/software/HaploPOP/.http://www.biomedcentral.com/1471-2105/16/242
Nicolas Duforet-FrebourgLucie GattepailleMichael BlumMattias JakobssonBMC Bioinformatics 2015, null:2422015-07-31T12:00:00Zdoi:10.1186/s12859-015-0661-6/content/figures/s12859-015-0661-6-toc.gifBMC Bioinformatics1471-2105${item.volume}2422015-07-31T12:00:00ZXML Erratum to: Improving protein order-disorder classification using charge-hydropathy plots No description availablehttp://www.biomedcentral.com/1471-2105/16/241
Fei HuangChristopher OldfieldBin XueWei-Lun HsuJingwei MengXiaowen LiuLi ShenPedro RomeroVladimir UverskyA. DunkerBMC Bioinformatics 2015, null:2412015-07-31T00:00:00Zdoi:10.1186/s12859-015-0646-5/content/figures/s12859-015-0646-5-toc.gifBMC Bioinformatics1471-2105${item.volume}2412015-07-31T00:00:00ZXML Comparison of phosphorylation patterns across eukaryotes by discriminative N-gram analysis Background:
How protein phosphorylation relates to kingdom/phylum divergence is largely unknown and the amino acid residues surrounding the phosphorylation site have profound importance on protein kinase–substrate interactions. Standard motif analysis is not adequate for large scale comparative analysis because each phophopeptide is assigned to a unique motif and perform poorly with the unbalanced nature of the input datasets.
Results:
First the discriminative n-grams of five species from five different kingdom/phyla were identified. A signature with 5540 discriminative n-grams that could be found in other species from the same kingdoms/phyla was created. Using a test data set, the ability of the signature to classify species in their corresponding kingdom/phylum was confirmed using classification methods. Lastly, ortholog proteins among proteins with n-grams were identified in order to determine to what degree was the identity of the detected n-grams a property of phosphosites rather than a consequence of species-specific or kingdom/phylum-specific protein inventory. The motifs were grouped in clusters of equal physico-chemical nature and their distribution was similar between species in the same kingdom/phylum while clear differences were found among species of different kingdom/phylum. For example, the animal-specific top discriminative n-grams contained many basic amino acids and the plant-specific motifs were mainly acidic. Secondary structure prediction methods show that the discriminative n-grams in the majority of the cases lack from a regular secondary structure as on average they had 88 % of random coil compared to 66 % found in the phosphoproteins they were derived from.
Conclusions:
The discriminative n-grams were able to classify organisms in their corresponding kingdom/phylum, they show different patterns among species of different kingdom/phylum and these regions can contribute to evolutionary divergence as they are in disordered regions that can evolve rapidly. The differences found possibly reflect group-specific differences in the kinomes of the different groups of species.http://www.biomedcentral.com/1471-2105/16/239
Itziar FradesSvante ResjöErik AndreassonBMC Bioinformatics 2015, null:2392015-07-30T12:00:00Zdoi:10.1186/s12859-015-0657-2/content/figures/s12859-015-0657-2-toc.gifBMC Bioinformatics1471-2105${item.volume}2392015-07-30T12:00:00ZXML SuRankCo: supervised ranking of contigs in de novo assemblies Background:
Evaluating the quality and reliability of a de novo assembly and of single contigs in particular is challenging since commonly a ground truth is not readily available and numerous factors may influence results. Currently available procedures provide assembly scores but lack a comparative quality ranking of contigs within an assembly.
Results:
We present SuRankCo, which relies on a machine learning approach to predict quality scores for contigs and to enable the ranking of contigs within an assembly. The result is a sorted contig set which allows selective contig usage in downstream analysis. Benchmarking on datasets with known ground truth shows promising sensitivity and specificity and favorable comparison to existing methodology.
Conclusions:
SuRankCo analyzes the reliability of de novo assemblies on the contig level and thereby allows quality control and ranking prior to further downstream and validation experiments.http://www.biomedcentral.com/1471-2105/16/240
Mathias KuhringPiotr DabrowskiVitor PiroAndreas NitscheBernhard RenardBMC Bioinformatics 2015, null:2402015-07-30T00:00:00Zdoi:10.1186/s12859-015-0644-7/content/figures/s12859-015-0644-7-toc.gifBMC Bioinformatics1471-2105${item.volume}2402015-07-30T00:00:00ZXML <it>Equalizer</it> reduces SNP bias in Affymetrix microarrays Background:
Gene expression microarrays measure the levels of messenger ribonucleic acid (mRNA) in a sample using probe sequences that hybridize with transcribed regions. These probe sequences are designed using a reference genome for the relevant species. However, most model organisms and all humans have genomes that deviate from their reference. These variations, which include single nucleotide polymorphisms, insertions of additional nucleotides, and nucleotide deletions, can affect the microarray’s performance. Genetic experiments comparing individuals bearing different population-associated single nucleotide polymorphisms that intersect microarray probes are therefore subject to systemic bias, as the reduction in binding efficiency due to a technical artifact is confounded with genetic differences between parental strains. This problem has been recognized for some time, and earlier methods of compensation have attempted to identify probes affected by genome variants using statistical models. These methods may require replicate microarray measurement of gene expression in the relevant tissue in inbred parental samples, which are not always available in model organisms and are never available in humans.
Results:
By using sequence information for the genomes of organisms under investigation, potentially problematic probes can now be identified a priori. However, there is no published software tool that makes it easy to eliminate these probes from an annotation. I present equalizer, a software package that uses genome variant data to modify annotation files for the commonly used Affymetrix IVT and Gene/Exon platforms. These files can be used by any microarray normalization method for subsequent analysis. I demonstrate how use of equalizer on experiments mapping germline influence on gene expression in a genetic cross between two divergent mouse species and in human samples significantly reduces probe hybridization-induced bias, reducing false positive and false negative findings.
Conclusions:
The equalizer package reduces probe hybridization bias from experiments performed on the Affymetrix microarray platform, allowing accurate assessment of germline influence on gene expression.http://www.biomedcentral.com/1471-2105/16/238
David QuigleyBMC Bioinformatics 2015, null:2382015-07-30T00:00:00Zdoi:10.1186/s12859-015-0669-y/content/figures/s12859-015-0669-y-toc.gifBMC Bioinformatics1471-2105${item.volume}2382015-07-30T00:00:00ZXML Computing all hybridization networks for multiple binary phylogenetic input trees Background:
The computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events is the computation of hybridization networks.
Results:
This work presents the first algorithm computing the hybridization number as well as a set of representative hybridization networks for multiple binary phylogenetic input trees on the same set of taxa. To improve its practical runtime, we show how this algorithm can be parallelized. Moreover, we demonstrate the efficiency of the software Hybroscale, containing an implementation of our algorithm, by comparing it to PIRNv2.0, which is so far the best available software computing the exact hybridization number for multiple binary phylogenetic trees on the same set of taxa. The algorithm is part of the software Hybroscale, which was developed specifically for the investigation of hybridization networks including their computation and visualization. Hybroscale is freely available
1
and runs on all three major operating systems.
Conclusion:
Our simulation study indicates that our approach is on average 100 times faster than PIRNv2.0. Moreover, we show how Hybroscale improves the interpretation of the reported hybridization networks by adding certain features to its graphical representation.http://www.biomedcentral.com/1471-2105/16/236
Benjamin AlbrechtBMC Bioinformatics 2015, null:2362015-07-30T00:00:00Zdoi:10.1186/s12859-015-0660-7/content/figures/s12859-015-0660-7-toc.gifBMC Bioinformatics1471-2105${item.volume}2362015-07-30T00:00:00ZXML Evaluation of variant detection software for pooled next-generation sequence data Background:
Despite the tremendous drop in the cost of nucleotide sequencing in recent years, many research projects still utilize sequencing of pools containing multiple samples for the detection of sequence variants as a cost saving measure. Various software tools exist to analyze these pooled sequence data, yet little has been reported on the relative accuracy and ease of use of these different programs.
Results:
In this manuscript we evaluate five different variant detection programs—The Genome Analysis Toolkit (GATK), CRISP, LoFreq, VarScan, and SNVer—with regard to their ability to detect variants in synthetically pooled Illumina sequencing data, by creating simulated pooled binary alignment/map (BAM) files using single-sample sequencing data from varying numbers of previously characterized samples at varying depths of coverage per sample. We report the overall runtimes and memory usage of each program, as well as each program’s sensitivity and specificity to detect known true variants.
Conclusions:
GATK, CRISP, and LoFreq all gave balanced accuracy of 80 % or greater for datasets with varying per-sample depth of coverage and numbers of samples per pool. VarScan and SNVer generally had balanced accuracy lower than 80 %. CRISP and LoFreq required up to four times less computational time and up to ten times less physical memory than GATK did, and without filtering, gave results with the highest sensitivity. VarScan and SNVer had generally lower false positive rates, but also significantly lower sensitivity than the other three programs.http://www.biomedcentral.com/1471-2105/16/235
Howard HuangJames MullikinNancy HansenNISC Comparative Sequencing ProgramBMC Bioinformatics 2015, null:2352015-07-29T12:00:00Zdoi:10.1186/s12859-015-0624-y/content/figures/s12859-015-0624-y-toc.gifBMC Bioinformatics1471-2105${item.volume}2352015-07-29T12:00:00ZXML Inferring 3D chromatin structure using a multiscale approach based on quaternions Background:
The knowledge of the spatial organisation of the chromatin fibre in cell nuclei helps researchers to understand the nuclear machinery that regulates dna activity. Recent experimental techniques of the type Chromosome Conformation Capture (3c, or similar) provide high-resolution, high-throughput data consisting in the number of times any possible pair of dna fragments is found to be in contact, in a certain population of cells. As these data carry information on the structure of the chromatin fibre, several attempts have been made to use them to obtain high-resolution 3d reconstructions of entire chromosomes, or even an entire genome. The techniques proposed treat the data in different ways, possibly exploiting physical-geometric chromatin models. One popular strategy is to transform contact data into Euclidean distances between pairs of fragments, and then solve a classical distance-to-geometry problem.
Results:
We developed and tested a reconstruction technique that does not require translating contacts into distances, thus avoiding a number of related drawbacks. Also, we introduce a geometrical chromatin chain model that allows us to include sound biochemical and biological constraints in the problem. This model can be scaled at different genomic resolutions, where the structures of the coarser models are influenced by the reconstructions at finer resolutions. The search in the solution space is then performed by a classical simulated annealing, where the model is evolved efficiently through quaternion operators. The presence of appropriate constraints permits the less reliable data to be overlooked, so the result is a set of plausible chromatin configurations compatible with both the data and the prior knowledge.
Conclusions:
To test our method, we obtained a number of 3d chromatin configurations from hi-c data available in the literature for the long arm of human chromosome 1, and validated their features against known properties of gene density and transcriptional activity. Our results are compatible with biological features not introduced a priori in the problem: structurally different regions in our reconstructions highly correlate with functionally different regions as known from literature and genomic repositories.http://www.biomedcentral.com/1471-2105/16/234
Claudia CaudaiEmanuele SalernoMonica ZoppèAnna TonazziniBMC Bioinformatics 2015, null:2342015-07-29T12:00:00Zdoi:10.1186/s12859-015-0667-0/content/figures/s12859-015-0667-0-toc.gifBMC Bioinformatics1471-2105${item.volume}2342015-07-29T12:00:00ZXML SpirPro: A <it>Spirulina</it> proteome database and web-based tools for the analysis of protein-protein interactions at the metabolic level in <it>Spirulina (Arthrospira) platensis</it> C1 Background:
Spirulina (Arthrospira) platensis is the only cyanobacterium that in addition to being studied at the molecular level and subjected to gene manipulation, can also be mass cultivated in outdoor ponds for commercial use as a food supplement. Thus, encountering environmental changes, including temperature stresses, is common during the mass production of Spirulina. The use of cyanobacteria as an experimental platform, especially for photosynthetic gene manipulation in plants and bacteria, is becoming increasingly important. Understanding the mechanisms and protein-protein interaction networks that underlie low- and high-temperature responses is relevant to Spirulina mass production. To accomplish this goal, high-throughput techniques such as OMICs analyses are used. Thus, large datasets must be collected, managed and subjected to information extraction. Therefore, databases including (i) proteomic analysis and protein-protein interaction (PPI) data and (ii) domain/motif visualization tools are required for potential use in temperature response models for plant chloroplasts and photosynthetic bacteria.DescriptionsA web-based repository was developed including an embedded database, SpirPro, and tools for network visualization. Proteome data were analyzed integrated with protein-protein interactions and/or metabolic pathways from KEGG. The repository provides various information, ranging from raw data (2D-gel images) to associated results, such as data from interaction and/or pathway analyses. This integration allows in silico analyses of protein-protein interactions affected at the metabolic level and, particularly, analyses of interactions between and within the affected metabolic pathways under temperature stresses for comparative proteomic analysis. The developed tool, which is coded in HTML with CSS/JavaScript and depicted in Scalable Vector Graphics (SVG), is designed for interactive analysis and exploration of the constructed network. SpirPro is publicly available on the web at http://spirpro.sbi.kmutt.ac.th.
Conclusions:
SpirPro is an analysis platform containing an integrated proteome and PPI database that provides the most comprehensive data on this cyanobacterium at the systematic level. As an integrated database, SpirPro can be applied in various analyses, such as temperature stress response networking analysis in cyanobacterial models and interacting domain-domain analysis between proteins of interest.http://www.biomedcentral.com/1471-2105/16/233
Jittisak SenachakSupapon CheevadhanarakApiradee HongsthongBMC Bioinformatics 2015, null:2332015-07-29T12:00:00Zdoi:10.1186/s12859-015-0676-z/content/figures/s12859-015-0676-z-toc.gifBMC Bioinformatics1471-2105${item.volume}2332015-07-29T12:00:00ZXML