at OpenHelix

Tag Archives: DGV

The Database of Genomic Variants (DGV) has been working on a new site for a while. It’s been available as a beta site to get used to it and kick the tires, but now it’s ready for prime time. They are retiring the existing site and moving to the new version.

As a public service announcement, I’ll paste the text of their email notice here. We’ll update our tutorial soon–we like to give new sites a bit of time to work out the bugs, but then we’ll rework our materials as soon as we can.

We will now host only one version of the database, and the original
site will be retired. We will continue to provide a track of the
original DGV data in the genome browser (gbrowse) which will be
searchable and include details from the original variant details
page. Any links from third party sites and software which use the
“VariationID” to point to the original DGV genome browser or
variant details page will be automatically redirected to the
corresponding entry in our new database. This will ensure that all
data (new and old) will be fully available to users.

We will work with the various partners and websites that provide
links to the original data to update the content to reflect the
information available in the new site.

With this final update, we have included a total of 53 studies,
representing all of the fully curated and accessioned versions of the
original studies, in addition to 10 new datasets. There are a number
of changes to the content and format, and we have summarized these in
the newsletter, which is available athttp://dgv.tcag.ca/dgv/docs/201306-DGV_Newsletter.pdf

In today’s tip I will briefly introduce you to the beta version of the updated DGV resource. The Database of Genomic Variants, or DGV, was created in 2004 at a time early in the understanding of human structural variation, or SV, which is defined by DGV as genomic variation larger than 50bp. DGV has historically provided public access to SV data in humans who are non-diseased. In the past it both accepted direct data submissions on SV and also provided high quality curation and analysis of the data such that it was appropriate for use in biomedical studies.

We’ve had an introductory tutorial on using DGV for years, and we’ve posted on changes at DGV in the past, so we were quite interested to read in their recent newsletter that there is a newly updated beta version of the DGV resource. The increase in SV data being generated by many large-scale sequencing projects as well as individual labs, has made it difficult for the DGV to continue to collect SV data, to provide a stable and comprehensive data archive AND to manually curate it at the level they have in the past. Therefore the DGV team is now partnering with DGVa at EBI and dbVar at NCBI. DGVa and dbVar will accept SV data submissions, and will function as public data archives (PDA) and, according to the publication sited below, DGVa and dbVar will:

“...provide stable and traceable identifiers and allow for a single point of access for data collections, facilitating download and meta-analysis across studies.“

DGV will no longer accept data submissions, but will instead use accessioned SV data from the archives and focus on providing the scientific community and public at-large with a subset of the data. Again quoting from the paper referenced below:

“The main role of DGV going forward will be to curate and visualize selected studies to facilitate interpretation of SV data, including implementing the highest-level quality standards required by the clinical and diagnostic communities.“

The original DGV resource is still available while comments are collected on the updated beta site. For more information on the updated DGV I suggest you check out this documentation from the DGV team: From their FAQ – “What is the data model used for DGV2?” and from a link in their top navigation area – “DGV Beta User Tutorial“. Be sure to check out the new displays & data that’s available, and most importantly to send your comments & suggestions to the group so that they can design a resource best suited for your needs.

Edit, March 5, 2012 – I wanted to add a clarification that we recieved through our contact link. I am pasting it in full, with permission from Margie:

“Hi Jennifer
We at TCAG think you did a great job on your video blog of the New Database of Genomic Variants.
I wanted to make a correction to one of your statements: “The increase in SV data (…) at the level they have in the past.”
We, the DGV team, have built a system that CAN handle the new volumes and types of SV data now being published, and we are able to curate all of these data. The reason we partnered with DGVa and dbVar was primarily to provide stable, “universal” accessions for SV data. We also work with DGVa and dbVar to define standard terminology, data types, and data exchange formats.
I just wanted to make sure it was clear that we are fully capable to handle the SV data being published now. Our reason for partnership was to foster standardized data and open data sharing across systems.
Thanks again for your blog post!
Margie Manker”

This notice came from DGV (Database of Genomic Variants) while I was on vacation last week, but I wanted to highlight this for a couple of reasons. First–it’s very cool that these groups have now chosen to establish a standard across databases for the representations of the copy-number variation displays. But I also like that they are now also providing support for the red-green colorblind. As someone from a family of the colorblind, that’s something I like to be able to access.

Here’s the note from the mailing list:

As a result of discussions surrounding the representation of structural variants at the recent ISCA meeting, groups at DGV, NCBI and DECIPHER have decided to standardize colour schemes for gains and losses. Moving forward, deletions/losses will be displayed as red, gains/duplications will be displayed as blue. Regions where both gains and losses occur at the same locus will be represented as brown, and we will continue to represent inversions as purple(indigo). In addition to ensuring the colour schemes are consistent across databases, changes have also been implemented to ensure ease of use for individuals with red-green colour blindness.

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

DGV: Database of Genomic Variants newsletter arrives for March. Nearly 30k CNVs at this time. They are now releasing annotations to the new human assembly (hg19 or GRCh37). You can see the newsletter here (warning, PDF). [Mary]

Want to catalog leaves from plants around the world? There’s an app for that: Leafview (hattip: ABW) [Trey]

Comprehensive tutorials on the publicly available dbGaP, GAD, and DGV databases enable researchers to quickly and effectively use these invaluable resources.

Seattle, WA: July 16, 2009 — OpenHelix today announced the availability of new tutorial suites on dbGaP, Genetic Assocation Database (GAD) and Database of Genomic Variants (DGV). The dbGaP resource is a database of genotypes and phenotypes with extensive variation data and clinical details GAD is an annotated resource connecting human genes and polymorphisms to diseases and traits, and DGV or Database of Genomic Variants, catalogs and displays structural variation in the human genome. These three new tutorials in conjunction with additional OpenHelix tutorials on dbSNP, VISTA, HapMap, GeneSNPs, SeattleSNPs, Genome Variation Server and many others, give the researcher an excellent set of training resources to assist in their genetic association and variation research.

The tutorial suites, available for single purchase or through a low- priced yearly subscription to all OpenHelix tutorials, contain an online, narrated, multi-media tutorial, which runs in just about any browser connected to the web, along with slides with full script, handouts and exercises. These tutorials will teach users:

dbGaP

to perform basic and advanced searches and navigate the dbGaP site

to understand the displays for the main open access data types: studies, variables, documents, and analyses

to use the analysis browser to identify candidate genomic regions for genotype-phenotype associations and to manipulate and customize the browser displays GAD

With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. The scripts, handouts and other materials can also be used as a reference or for training others. To find out more about these and over 70 other tutorial suites visit the OpenHelix Catalog and OpenHelix. Or visit the OpenHelix Blog for up-to-date information on genomics and genomics resources.

About OpenHelix:
OpenHelix, LLC, (www.openhelix.com) provides the genomics knowledge you need when you need it. OpenHelix provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web-based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.

I got my newsletter for May from the Database of Genomic Variants, or DGV. They announce the availability of a large data set of variants from HapMap individuals. There are more than 8000 variations available in this set.

It’s not peer-reviewed at this point, so keep that in mind. But if you are eager for new CNVs (copy number variations), you may want to have a look.

This data are released in DGV pre-publication, and we will therefore not incorporate these regions with the rest of the data in DGV (which has all gone through peer-review).
At this stage, the data will be made available through DGV in two ways. The entire data set will be available as a text file for download on the DGV download page, and it will be shown as a separate track in the DGV browser under the heading “Provisional data release from the Genome Structural Variation Consortium”, in a track with the name “NG42M_CNV (CNVE)”.

The data is subject to the “Fort Lauderdale” non-scoop rules: you can use the data, but the data’s owners reserve the right to publish on global aspects of the data set first. You can see more on the details of use here: http://projects.tcag.ca/variation/ng42m_cnv.php