DNA sequencing analysis

Understand the effects of genomic variation and mutations with DNA sequencing analysis

We routinely analyze whole genome, whole exome and targeted re-sequencing data of well-characterized organisms, such as humans, with an aim to mapping and identifying small genetic variation (single nucleotide polymorphisms, and short insertions and deletions (indels)) using established best practice methods. In order to help our clients interpret their variant data, we continuously develop our DNA sequencing analysis and variant annotation pipeline so as to include more and more information on each identified variant. For example, pathogenicity predictions and minor allele frequencies in databases such as 1000 Genomes and GnomAD provide an excellent way of filtering irrelevant variation from your results.

Our clients working in oncology are also interested in characterizing somatic mutations that are not limited to small-scale genomic events. We have worked on characterizing gene copy number variation in tumor and cancer cell line samples from both microarray and NGS data, and integrating them into expression data in order to quantify oncogenic gene dosage effects in different tumor types. We have also concentrated on developing DNA sequencing analysis pipelines in order to discover copy number-neutral genomic rearrangements leading to novel oncogenic fusion genes.

A third major application of DNA sequencing is genome assembly of non-model organisms. We produce genome assemblies based on WGS data that are then computationally post-processed to achieve the best possible quality. The assembled genomes are then annotated using gene prediction, automated homology searches using genome databases, and gene annotation transfer from closely related organisms. Thorough annotation of novel genomes ensures the best possible starting point for transcriptome studies on these organisms.

See examples of DNA sequencing analyses below:

Our statistical approaches to variant calling employ current best practices that result in a reliable set of variants. Natural variants and single nucleotide polymorphisms can be called against any reference genome in any organism or even a genomic ensemble compiled from individual genomes from sequencing projects for better representation. In addition to high confidence variants, we report regions of low coverage where the variant caller was not able to determine the sequence of samples. Whole genome, whole exome or targeted DNA-sequencing all enable variant calling equally well. The lists of variants can be further combined, compared and filtered in order to find disease-causing de novo germ line variants in trio studies, for example.

Deliverables:

Full variant lists for all samples with evidence from data-based evidence

Filtered variant lists based on any criteria (e.g. germ line control for mutations)

Genetic variants are annotated with information regarding their location in the genome, variant type (homozygous/heterozygous), evidence from data (supporting reads), functional classification for exonic variants, amino acid changes in all isoforms, database identifiers for known variants, observed minor allele frequencies in several genome databases, or even your own data. We also provide pathogenicity predictions for each exonic variant using several types of prediction software. Flexible ranking and filtering of the variants based on these annotations enables easy interpretation of complex genomic data for a geneticist or a physician.

Gene copy numbers can be deduced from sequencing data using our statistical approaches for analyzing both coverage information and allele frequency information. The analysis yields copy numbers for each chromosome-scale segment, gene, and exon independently. Gene copy numbers can be further integrated into expression data, for example, in order to find significant gene dosage effects.

Whole genome sequencing data coupled with mate pair information from paired-end sequencing can be used to study copy number-neutral genomic rearrangements such as inversions and translocations. These can result in fusion genes that are critically linked to formation of cancer, for example. We deliver a report of the altered genome structure with ranked fusion genes that can be validated with RNA-sequencing data.

For simpler organisms, we offer assembly of their genomes de novo based on DNA-sequencing data. Our approach is based on building a consensus assembly from outputs of several assembly tools, and then running computational post-assembly improvement software. If a draft genome exists, we can refine it computationally by joining contigs and resolving errors using improvement tools or additional DNA-seq or RNA-seq data.

Assembled genomes can always be annotated using gene and oriC prediction software and/or based on RNA-seq data. We predict gene identities for all putative genes by comparing their sequence to several genome databases. For genes with less sequence similarity, functions can be predicted by identified functional domains. If annotated genomes for close relatives exist, we can improve the annotation by transferring gene information to the unannotated genome using sequence alignment based approaches. The result is a comprehensive list of genes with their specific coordinates in the genome.

Identification of patient-specific tumor neoantigens (novel protein sequences that are created by tumor-specific DNA alterations) is one of the cornerstones of cancer immunotherapy. We can study exome sequencing data for non-synonymous somatic mutations in coding regions, and translate these in silico to peptides containing the mutation. Additional RNA-sequencing can be used to focus on highly expressed genes in order to ascertain high epitope abundance as well as to look for alternative splicing, exon skipping and translocation based neoantigens. The lists of epitopes can be further filtered or ranked algorithmically by analyzing aspects such as the likelihood of proteasomal processing, transport into the endoplasmic reticulum and affinity for the relevant MHC class I alleles.

Circulating cell-free DNA has potential uses in non-invasive genomic biomarkers, in particular for prenatal diagnosis and oncology. The mere presence of certain DNA sequences in plasma can reveal a tumor undetected by other means. Furthermore, mutations detected in circulating DNA can be used as markers in personalizing treatment and prognosis. Our pipeline for cell-free DNA-based biomarker discovery starts with a full quality control of the data followed by a statistical comparison of pathological and control groups in order to reveal biomarkers with the optimal combination of sensitivity and specificity. Considering biological factors along with clinical feasibility, we summarize the analysis by highlighting the most promising biomarker candidates.

Deliverables:

List of biomarker candidates from cell-free DNA

Sensitivity and specificity estimations for each candidate

Database identifiers for known mutations and pathogenicity predictions

Metagenomics offers an unbiased view into the microbial diversity of ecological niches including samples from host organisms and soil. Using whole-genome or, alternatively, 16S sequencing data, we assemble the sequence reads into contigs and assign them to species or operational taxonomic units (OTUs). We then quantify the abundance of each taxon. In the case of multiple samples, we compare the relative abundances and associate them with host phenotype or environmental factors. For whole-genome studies, we identify and annotate genes using both sequence homology and computational gene prediction.

Deliverables:

Quantitative characterization of microbial diversity

Association of species/OTU with host phenotype or other environmental factors

Identified and predicted genes with custom annotations

Get free information package about bioinformatics as a service to your email in pdf-form.