at OpenHelix

Tag Archives: copy number variation

The sequence data tsunami begins to crash into the shore, at the feet of clinicians and patients who want answers and treatment directions. But sometimes the tsunami is washing in debris. As the amount of sequence and variation information grows, some of it comes without clear evaluations of the impacts. Some of it comes with conflicting information. And some of it comes in wrong.

Attempting to wrangle the information into useful understanding and treatments with standardized descriptions, the team building the ClinGen resources published a paper last week that details their efforts. The paper describes their history and goals, and how they are moving to get to a point where they have useful information for and from patients, their doctors, testing labs, and researchers. Because of the different needs of different groups, there are several moving parts to the overall ClinGen collection.

In addition to the paper–and several related articles in this NEJM special report–there are videos on their site that tackle different aspects of the ClinGen projects. I’m going to highlight one of them here as the Tip of the Week, but you should also check out the others that are available on their webinars page or their YouTube channel. This video shows the Dosage Sensitivity Map features.

This video provides some of the history and framework for the ClinGen efforts, and then also introduces one of the tools that they have made available, a dosage sensitivity map. This piece focuses on “evidence based reviews of dosage sensitivity”, and they indicate haploinsufficiency losses of regions, and triplosensitivity duplications of regions. They describe a scoring system they use to rank structural variations (CNVs, SVs), and their curation of the evidence to support or to refute dosage sensitivity. They also note that their process is conservative, and you should keep that in mind as you consider the their team’s review of the evidence. But they are definitely open and interested in feedback and they hope you will contact them if you have a different understanding from their posted evaluations.

To follow along with the video, use this site to explore the features of this part of the ClinGen tool set: http://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/. But you can also just click their example genes–for instance, the ZEB2 link shows you a typical page with the score information, links to other resources, and a genome viewer right on the page. But you can also choose to look at external browsers at NCBI, Ensembl, or UCSC. I clicked the UCSC Genome Browser one to see how it displayed, and they automatically present to you tracks with the relevant ClinGen data loaded.

In other tips I’ll talk about other pieces of the infrastructure that they are building or coordinating with. Some we’ve talked about before–you can see a previous tip that included the ClinVar resource at NCBI that is foundational to the ClinGen suite and is discussed in their paper as well. They also note the importance of the data from OMIM, and how their mutual efforts are providing important feedback loops to be alerted to needed updates. ClinGen also employs the Human Phenotype Ontology that keeps coming up at OpenHelix lately. Another important piece to this is the standards for naming variants that were recently described by the American College of Medical Genetics and Genomics (paper linked below).

ClinGen, and the various component tools within, are worth looking at, and contributing to, as we try to move more and better information to the clinic for patients and doctors to use effectively. Steven Salzberg has a take on the value of ClinGen here: 17% Of Our Genetic Knowledge Is Wrong.

It’s also very possible that some really important things will happen in the database–new submissions, changes to the status of a variant–that will occur before any papers come out about it. Or it is even possible that a paper never will come out about it. Spend some time learning about the features; I think it will be worth the time.

This notice came from DGV (Database of Genomic Variants) while I was on vacation last week, but I wanted to highlight this for a couple of reasons. First–it’s very cool that these groups have now chosen to establish a standard across databases for the representations of the copy-number variation displays. But I also like that they are now also providing support for the red-green colorblind. As someone from a family of the colorblind, that’s something I like to be able to access.

Here’s the note from the mailing list:

As a result of discussions surrounding the representation of structural variants at the recent ISCA meeting, groups at DGV, NCBI and DECIPHER have decided to standardize colour schemes for gains and losses. Moving forward, deletions/losses will be displayed as red, gains/duplications will be displayed as blue. Regions where both gains and losses occur at the same locus will be represented as brown, and we will continue to represent inversions as purple(indigo). In addition to ensuring the colour schemes are consistent across databases, changes have also been implemented to ensure ease of use for individuals with red-green colour blindness.

For this week’s Tip of the Week I’ll introduce Varietas, a resource that integrates human variation information such as SNP and CNV data, and offers a handy tabular output with links to additional databases that will enable researchers to quickly explore other sources of information about the variations or regions of interest.

I think this is the first resource I’ve used from Finland. And it’s definitely the first resource I have used that is plaid. But it struck me that plaid is a pretty good conceptualization of the variations that we see in the genomes. Some are a single thread, some are larger sections, and the overlaps between the variations we observed in the genome are important to our understanding of them as well. And the history of computation leads back to textile manufacturing, in fact. So I thought it was a pretty good concept.

But let’s explore the threads of Varietas. You can read the paper which is linked below, but here I’ll just summarize some of the main features. First let me say the focus of this database appears to be human variation. Although you wouldn’t know that from the site very clearly. As far as I could tell there wasn’t any other species data. But if you want human variation data, you’ll find a variety of threads available to you. If you check out the About page, you’ll see the source data available includes Ensembl, the NHGRI GWAS catalog, SNPedia, and GAD. These sources also provide OMIM data, HGNC nomenclature, phenotypes, and MeSH terms. And the threads out include dbSNP, PubMed, SNPedia, and WikiGenes as well. This is also summarized nicely in Figure 1 of their paper.

It’s a very straightforward interface. There is a basic search with a text box for quick searching, and you select the type of data you are starting with: SNPs, genes, keywords, or locations. And the output will be a table with the results that correspond to your query.

If you have larger sets of features that you want to interrogate you can use the advanced forms to enter more data.

The tabular output can be viewed on the web with all the handy links. Or you can download the data as a text file to be used in other ways.

I’ll demonstrate the sample search for the movie, but you won’t see the full range of data that’s available there. I wish they had samples for each type of search. But I found one sample that will also show CNV results: choose the Location radio button and enter this location range to see some CNV samples 6:1234-123400

This next post in our continuing semi-regular Guest Post series is from Xiaowu Gai, the Bioinformatics Core Director at CHOP . If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

Thanks to Mary for running a Tip of the Week – “CHOP CNV database” a couple of months back. CHOP CNV database is a high-resolution genome-wide survey of copy number variations of a large number (2,026) of apparently healthy individuals. It is publicly accessible and has been widely used by a large number of research groups world-wide. I am now pleased to announce the public release of our software system behind it: CNV Workshop. CNV Workshop is a suite of software tools that we have developed over the last a few years. It provides a comprehensive workflow for analyzing, managing, and visualizing genome copy number variation (CNV) data.

It can be used for almost any CNV research or clinical project by offering the following capabilities for both individual samples and cohort studies:

CNV Workshop currently accepts genotyping array data from Illumina’s 550k, 610- and 660-Quad, and Omni arrays, along with Affymetrix’s 5.0 and 6.0 arrays, and can be easily configured to accept data from other platforms. The package comes preloaded with publicly available reference data from more than 2,000 healthy control subjects (the CHOP CNV Database). CNV Workshop also allows the user to upload already processed CNV calls for annotation and presentation.

Not just the genome, but genomeS. As Jan at Saaien Tist has mentioned, human (and other species) genomes are quiet variable. Though the linear representation of genome browsers makes perfect sense (like the UCSC Genome Browser, Ensembl, GBrowse and MapViewer among others) for much annotated data of the genome, structural variations are not so well visualized in a linear representation. And, as we are find the human and other specie genomes are quite variable, we might need to come up with another way to visualize these genomic data beyond the ‘reference genome’ linear model. Jan suggests deBruijn graphs, pictured here. I find some difficulty in ‘visualizing’ how these are going to work for the _other_ annotations in the data. Though this representation looks like it might work great for CNV and the like, it seems to make viewing other types of data (expression, SNP, etc) more complicated. I’m looking forward to see how this develops.

Or that we are all 99.9% genetically similar to each other? Well I certainly did, and boy was I wrong!

It turns out that CNVs (Copy Number Variations) are causing the “facts” some of us learned in Molecular Biology 101 to be rewritten. If you, like me, thought that what you learned years ago was still true, then there is a great webinar you may want to watch. It is brought to you by Science/AAAS, and it features three prominent experts in genetic variability, Drs. Charles Lee, Lars Feuk and Alexandra Blakemore.

The moderator is Dr. Sean Sanders, who is the Commercial Editor of Science. Even those of you that are up to speed on the current research can find many interesting facts and learn about the new techniques used to study CNVs, or just genetic variability in general. It turns out that CNVs are much more prevalent than was previously thought. You hear so much about SNPs that it seems like they are the source of genetic variability that we should be most concerned about, but CNVs are catching up real fast. This new field is rapidly advancing because of major technology breakthroughs.

All of the panelists present a short talk highlighting the prevalence, importance and experimental limitations of studying CNVs and their role in normal human variability, as well as in disease. They present some of their own data and discuss the future direction of this young field. This is followed by a very interesting question and answer session where they allowed listeners to email their questions. It may even turn out that CNVs are the reason that your personality, IQ, height and weight differ from your colleagues, friends and family. So not only is this an exciting new field, but it is certainly one we can all relate to! Continue reading →