With the deluge of data streaming from Next Generation Sequencing (NGS) technologies, scientists are scrambling to find the best methods for Copy Number Variation (CNV) analysis. Of the two main types of NGS data used for CNV analysis – Whole Genome and Exome sequencing, the paper “Statistical Challenges associated with detecting copy number variations with next-generation sequencing” (Bioinformatics. 2012 Nov 1;28(21):2711-8.), reviews germ-line Whole Genome Sequencing (not tumors) data analysis for CNVs and focuses principally on Depth of Coverage (DoC) methods.

An ontology is a computational representation of a domain of knowledgebased upon a controlled, standardized vocabulary for describing entities and the semantic relationships between them. The Human Phenotype Ontology (HPO) aims to provide a standardized vocabulary of phenotypic abnormalities encountered in human disease. Terms in the HPO describes a phenotypic abnormality, such as atrial septal defect. The HPO was initially developed using information from Online Mendelian Inheritance in Man (OMIM), which is a hugely important data resource in the field of human genetics and beyond. The HPO is currently being developed using information from OMIM and the medical literature and contains approximately 10,000 terms. Over 50,000 annotations to hereditary diseases are available for download or can be browsed using the PhenExplorer.

A single nucleotide polymorphism (SNP) in a coding region of DNA that results in an amino acid change in the corresponding protein is termed a non-synonymous or missense variant. Many of these variants have been implicated in human disease phenotypes but, in the absence of functional assays, the related pathogenicity of many remain unclassified. A number of in silico tools have been developed to predict the effect of missense variants. Some of these tools are used routinely by diagnostic labs to advise clinicians of disease likelihood in combination with other evidence.

A report analysing popular missense prediction tools and the use of multiple sequence alignments is available here or here.

In-silico splice site prediction tools can be used to predict the effect of a genetic variant on splicing. A large number of prediction tools are currently available, however these have not been formally assessed and may give divergent results. This analysis aims to provide an assessment of the performance of these algorithms in the prediction of splicing-related variant pathogenicity. It will also assess the scope of the splice-site prediction tools to ensure that they can be used in the most appropriate way. The analysis will allow scientists to use splice site prediction tools in the prediction of pathogenesis with more confidence.