In Silico Biology - Volume 5, issue 3

Purchase individual online access for 1 year to this journal.

Price: EUR N/A

ISSN 1386-6338 (P)
ISSN 1434-3207 (E)

In Silico Biology is a scientific research journal for the advancement of computational models and simulations applied to complex biological phenomena. We publish peer-reviewed leading-edge biological, biomedical and biotechnological research in which computer-based (i.e.,
"in silico") modeling and analysis tools are developed and utilized to predict and elucidate dynamics of biological systems, their design and control, and their evolution. Experimental support may also be provided to support the computational analyses.

In Silico Biology aims to advance the knowledge of the principles of organization of living systems. We strive to provide computational frameworks for understanding how observable biological properties arise from complex systems. In particular, we seek for integrative formalisms to decipher cross-talks underlying systems level properties, ultimate aim of multi-scale models.

Studies published in
In Silico Biology generally use theoretical models and computational analysis to gain quantitative insights into regulatory processes and networks, cell physiology and morphology, tissue dynamics and organ systems. Special areas of interest include signal transduction and information processing, gene expression and gene regulatory networks, metabolism, proliferation, differentiation and morphogenesis, among others, and the use of multi-scale modeling to connect molecular and cellular systems to the level of organisms and populations.

In Silico Biology also publishes foundational research in which novel algorithms are developed to facilitate modeling and simulations. Such research must demonstrate application to a concrete biological problem.

In Silico Biology frequently publishes special issues on seminal topics and trends. Special issues are handled by Special Issue Editors appointed by the Editor-in-Chief. Proposals for special issues should be sent to the Editor-in-Chief.

About In Silico Biology

The term
"in silico" is a pendant to
"in vivo" (in the living system) and
"in vitro" (in the test tube) biological experiments, and implies the gain of insights by computer-based simulations and model analyses.

In Silico Biology (ISB) was founded in 1998 as a purely online journal. IOS Press became the publisher of the printed journal shortly after. Today, ISB is dedicated exclusively to biological systems modeling and multi-scale simulations and is published solely by IOS Press. The previous online publisher, Bioinformation Systems, maintains a website containing studies published between 1998 and 2010 for archival purposes.

We strongly support open communications and encourage researchers to share results and preliminary data with the community. Therefore, results and preliminary data made public through conference presentations, conference proceeding or posting of unrefereed manuscripts on preprint servers will not prohibit publication in ISB. However, authors are required to modify a preprint to include the journal reference (including DOI), and a link to the published article on the ISB website upon publication.

Abstract: Alternative splicing of mRNA allows many gene products with different functions to be produced from a single coding sequence. Exon skipping is the most commonly known alternative splicing mechanism. A comprehensive database of alternative splicing by exon skipping is made available for the human genome data. 1,229 human genes are identified to exhibit alternative splicing by exon skipping.

Abstract: Estimation of structure predictability for a particular protein is difficult. Many methods estimate it in an a posteriori system evaluating the final, native protein structure. The SPI scale is intended to estimate the structure predictability of a {particular} amino acid sequence in an a priori system. A sequence-to-structure library was created based on the complete Protein Data Bank. The tetrapeptide was selected as a unit representing a well-defined structural motif. The early-stage folding structure (a model…of which was presented elsewhere) was taken as the object for protein structure classification. Seven structural forms were distinguished for structure classification. The degree of determinability was estimated for the sequence-to-structure and structure-to-sequence relations particularly interesting for threading methods. A comparative analysis of the SPI and Q7 scales with the commonly used SOV and Q3 scales is presented. The complete contingency table, supplementary materials and all the programs used are available on request.
Show more

Abstract: Recent research on large scale microarray analysis has explored the use of Relevance Networks to find networks of genes that are associated to each other in gene expression data. In this work, we compare Relevance Networks with other types of clustering methods to test some of the stated advantages of this method. The dataset we used consists of artificial time series of Boolean gene expression values, with the aim of mimicking microarray data, generated from simple…artificial genetic networks. By using this dataset, we could not confirm that Relevance Networks based on mutual information perform better than Relevance Networks based on Pearson correlation, partitional clustering or hierarchical clustering, since the results from all methods were very similar. However, all three methods successfully revealed the subsets of co-expressed genes, which is a valuable step in identifying co-regulation.
Show more

Abstract: Magmas is a nuclear encoded protein found in the mitochondria of mammalian cells. It participates in granulocyte-macrophage-colony stimulating factor (GM-CSF) signaling in hematopoietic cells and has an essential role in invertebrate development. In order to characterize the protein structural features and gene evolution of Magmas, a dataset containing 61 Magmas homologs from 52 species distributed among animals, plants and fungi was analyzed. All Magmas members were found to possess three novel sequence motifs in addition to…a conserved leader peptide. Phylogenetic tree and dN/dS rate ratios showed that Magmas was evolutionarily conserved. Analysis of Magmas gene organization demonstrated incremental intron acquisition in plants and vertebrates. Significant genetic diversity in Magmas was observed from kingdom specific amino acid signatures, the presence of predicted signal peptides that target the protein to other intracellular locations besides the mitochondria, and the detection of multiple isoforms in higher animals. These studies demonstrate that Magmas members constitute an important family of conserved proteins having multifunctional activities, and provide a basis for future experiments.
Show more

Abstract: Coding information is the main source of heterogeneity (non-randomness) in the sequences of microbial genomes. The heterogeneity corresponds to a cluster structure in triplet distributions of relatively short genomic fragments (200–400 bp). We found a universal 7-cluster structure in microbial genomic sequences and explained its properties. We show that codon usage of bacterial genomes is a multi-linear function of their genomic G+C-content with high accuracy. Based on the analysis of 143 completely…sequenced bacterial genomes available in Genbank in August 2004, we show that there are four "pure" types of the 7-cluster structure observed. All 143 cluster animated 3D-scatters are collected in a database which is made available on our web-site (http://www.ihes.fr/~zinovyev/7clusters). The findings can be readily introduced into software for gene prediction, sequence alignment or microbial genomes classification.
Show more

Abstract: A statistical analysis of the Protein Databank (PDB) structures had led us to define a set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one defined by the (ϕ, ψ) dihedral angles of 5 consecutive residues. Here, we analyze the effect of the enlargement of the PDB on the PBs' definition. The results highlight the quality of the 3D approximation ensured by the PBs. These last could be…of great interest in ab initio modeling.
Show more

Abstract: The CBCAnalyzer (CBC = compensatory base change) is a custom written software toolbox consisting of three parts, CTTransform, CBCDetect, and CBCTree. CTTransform reads several ct-file formats, and generates a so called "bracket-dot-bracket" format that typically is used as input for other tools such as RNAforester, RNAmovie or MARNA. The latter one creates a multiple alignment based on primary sequences and secondary structures that now can be used as input for CBCDetect. CBCDetect counts CBCs in all…against all of the aligned sequences. This is important in detecting species that are discriminated by their sexual incompatibility. The count (distance) matrix obtained by CBCDetect is used as input for CBCTree that reconstructs a phylogram by using the algorithm of BIONJ. In this note we describe the features of the toolbox as well as application examples. The toolbox provides a graphical user interface. It is written in C++ and freely available at: http://cbcanalyzer.bioapps.biozentrum.uni-wuerzburg.de.
Show more

Abstract: The existence of a soluble splice variant for a gene encoding a transmembrane protein suggests that this gene plays a role in intercellular signalling, particularly in immunological processes. Also, the absence of a splice variant of a reported soluble variant suggests exclusive control of the solubilisation by proteolytic cleavage. Soluble splice variants of membrane proteins may also be interesting targets for crystallisation as their structure may be expected to preserve, at least partially, their function as…integral membrane proteins, whose structures are most difficult to determine. This paper presents a dataset derived from the literature in an attempt to collect all reported soluble variants of membrane proteins, be they splice variants or shedded. A list of soluble variants is derived in silico from Ensembl. These are checked on their presence in multiple organisms and their number of membranespanning regions is inspected. The findings then are confirmed by a comparison with identified proteins of a recent global proteomics study of human blood plasma. Finally, a tool to support the identification of novel soluble variants by proteomics is provided.
Show more

Abstract: Assigning nomenclature codes to biomedical data is an arduous, expensive and error-prone task. Data records are coded to to provide a common representation of contained concepts, allowing facile retrieval of records via a standard terminology. In the medical field, cancer registrars, nurses, pathologists, and private clinicians all understand the importance of annotating medical records with vocabularies that codify the names of diseases, procedures, billing categories, etc. Molecular biologists need codified medical records…so that they can discover or validate relationships between experimental data and clinical data. This paper introduces a new approach to retrieving data records without prior coding. The approach achieves the same result as a search over pre-coded records. It retrieves all records that contain any terms that are synonymous with a user's query-term. A recently described fast algorithm (the doublet method) permits quick iterative searches over every synonym for any term from any nomenclature occurring in a dataset of any size. As a demonstration, a 105+ Megabyte corpus of Pubmed abstracts was searched for medical terms. Query terms were matched against either of two vocabularies and expanded as an array of equivalent search items. A single search term may have over one hundred nomenclature synonyms, all of which were searched against the full database. Iterative searches of a list of concept-equivalent terms involves many more operations than a single search over pre-annotated concept codes. Nonetheless, the doublet method achieved fast query response times (0.05 seconds using Snomed and 5 seconds using the Developmental Lineage Classification of Neoplasms, on a computer with a 2.89 GHz processor). Pre-annotated datasets lose their value when the chosen vocabulary is replaced by a different vocabulary or by a different version of the same vocabulary. The doublet method can employ any version of any vocabulary with no pre-annotation. In many instances, the enormous effort and expense associated with data annotation can be eliminated by on-the-fly doublet matching. The algorithm for nomenclature-based database searches using the doublet method is described. Perl scripts for implementing the algorithm and testing execution speed are provided as open source documents available from the Association for Pathology Informatics (www.pathologyinformatics.org/informatics_r.htm).
Show more

Abstract: Human vascular endothelial growth factor (VEGF), angiopoietin (ANG) and tyrosine kinase with immunoglobulin and epidermal growth factor homology domains (TIE)-2 consist of a grouping of proteins that are involved in vascular homeostasis, vascular integrity and angiogenesis. There are nine proteins in the immediate VEGF family: VEGFA, VEGFB, VEGFC, VEGFD, VEGF-3, placental growth factor (PGF), VEGF receptor (VEGFR)-1, VEGFR-2, and VEGFR-1-related. They can be stimulated by cytokines to become involved in immune responses. By using…in silico tools, we were able to identify several possible analogues or homologues of VEGF, ANG and TIE-2 in invertebrates. This is the first report to show that these proteins may be conserved through evolution. These proteins may have a role in vascular maintenance and immunity. In addition, since VEGF, ANG and TIE-2 have a role in mammalian immunity that is significantly influenced by cytokines, such as IL-1, this may indicate an interaction of the vascular system and the immune system over evolutionary time.
Show more