New Data Release 201606

Another new data release was just rolled out. Some highlights of this new release include the support of GRCh38/hg38 genome assembly, updated and additional data sources, as well as new data fields added. All changes in this data release are backwards-compatible.

Support variants on GRCh38/hg38 Genome Assembly

Previously, all variant annotations were aggregated according to their "_id" fields (HGVS names) based on GRCh37/hg19 reference genome assembly. You can still query for GRCh38/hg38 positions/intervals, but the returned variant hits are always on hg19. Now, you can now query for hg38 positions/intervals for variants on hg38 directly. We aggregated annotations from data sources where hg38 positions are provided. Currently, there are five of these including: dbSNP, dbNSFP, ClinVar, EVS and UniProt. To retrieve or query variants using hg38 coordinates, users could specify assembly=hg38 parameter in the URL. By default, MyVariant.info still queries on hg19 assembly. Here is an example, these are the same variant on the hg19 and hg38 assemblies respectively:

ClinVar, dbSNP and dbNSFP annotations are available under "clinvar" and "dbsnp", and "dbnsfp" subfields, respectively, for each annotated variant. MyVariant.info aggregates annotations from ClinVar, dbSNP, dbNSFP and other 11 sources for each variant, so you can access them all in one request.

The total number of unique variants is now over 340M (340,102,225), compared to 334M previously. More details about the variant data we provide from MyVariant.info are always available from our documentation. The programmatic access of this information is available from our metadata endpoint.

New fields for genomic positions:

Previously, the genomic position of a variant is provided as sub-fields under each data source field (e.g. clinvar.hg19, clinvar.hg38 and clinvar.chrom). We have now provided these fields ("hg19", "hg38" and "chrom") at the root of each variant annotation object, as the universal fields for genomic positions.

hg19 and hg38

When provided, a variant object on GRCh37/hg19 genomic assembly should contain a "hg19" field. Likewise, a variant object on GRCh38/hg38 genomic assembly should contain a "hg38" field. Both fields include start and end positions:

Note that this field is always a string (without "chr" part), even for chromosome "1"-"22".

Query RCV Accession Number and gene symbol directly:

Previously, we allowed users to query for matching variants directly using a "rsid". We have now included ClinVar "RCV Accession" and "gene symbol" as those special fields, which you can query directly without a need to specify which field to search on: