The gene predictions selected here will be used to determine the effect of each variant on genes, for example intronic, missense, splice site, intergenic etc.

Select Regulatory Annotations

The annotations in this section provide predicted regulatory regions based on various experimental data. When a variant overlaps an annotation selected here, the consequence term regulatory_region_variant will be assigned. Follow the links to description pages that explain how each dataset was constructed. Some datasets cover a significant portion of the genome and it may be desirable to filter these annotations by cell type and/or score in order to avoid an overabundance of hits.

Include HGVS genomic (g.) terms in output
Include HGVS coding (c.) terms if applicable, otherwise noncoding (n.) terms, in output
Include HGVS protein (p.) terms (if applicable) in output
When including HGVS protein (p.) terms, add parentheses around changes to emphasize that they are predictions
For variants that involve both a deletion and insertion, including multi-nucleotide variants, include the deleted sequence (e.g. show "delAGinsTT" instead of only "delinsTT")

Select RefSeq Genes or an official GENCODE release ("Basic Gene Annotation Set from GENCODE..." or "Comprehensive Gene Annotation Set...") in the "Select Genes" section above in order to make options appear.

This tool is for research use only. While this tool is open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal questions.

Using the Variant Annotation Integrator

Introduction

The Variant Annotation Integrator (VAI) is a research tool for associating
annotations from the UCSC database with your uploaded set of variant calls.
It uses gene annotations to predict functional effects of variants on transcripts.
For example, a variant might be located in the coding sequence
of one transcript, but in the intron of an alternatively spliced transcript
of the same gene; the VAI will return the predicted functional effect
for each transcript. The VAI can optionally add several other
types of relevant information: the dbSNP identifier if the variant
is found in
dbSNP,
protein damage scores for missense variants from the
Database of Non-synonymous Functional Predictions (dbNSFP),
and conservation scores computed from multi-species alignments.
The VAI can optionally filter results to retain only specific functional
effect categories, variant properties and multi-species conservation status.

NOTE:
The VAI is only a research tool, meant to be used by those who have been
properly trained in the interpretation of genetic data,
and should never be used to make any kind of medical decision.
We urge users seeking information about a personal medical or genetic
condition to consult with a qualified physician for diagnosis and for
answers to personal questions.

Submitting your variant calls

In order to use the VAI, you must provide variant calls in either the
Personal Genome SNP (pgSnp) or
VCF format.
pgSnp-formatted variants may be uploaded as a
Custom Track.
Compressed and indexed VCF files must be on a web server (HTTP, HTTPS or FTP)
and configured as Custom Tracks, or if you happen to have a
Track Hub,
as hub tracks.

Protein-coding gene transcript effect predictions

Any gene prediction track in the UCSC Genome Browser database or in a track hub
can be selected as the VAI's source of transcript annotations for prediction
of functional effects.
Sequence Ontology (SO) terms are used to describe the effect
of each variant on genes in terms of transcript structure as follows:

A variant that causes no change to the transcript sequence and/or
specifies only the reference allele, no alternate allele.
In rare cases when the transcript sequence (e.g. from RefSeq) differs from the
reference genome assembly, a difference from the reference genome may restore
the transcript sequence instead of altering it.

Optional annotations

In addition to protein-coding genes, some genome assemblies offer other sources of
annotations that can be included in the output for each variant.

dbNSFP provides scores and predictions from several tools that use various
machine learning techniques to estimate the likelihood that a single-nucleotide
missense variant would damage a protein's structure and function:

PolyPhen-2 (Polymorphism Phenotyping v2)
applies a naive Bayes classifier using
several sequence-based and structure-based predictive features
including refined multi-species alignments.
PolyPhen-2 was trained on two datasets, and dbNSFP provides
scores for both.
The HumDiv training set is intended for evaluating rare alleles
potentially involved in complex phenotypes, for example in
genome-wide association studies (GWAS).
Predictions are derived from scores, with these ranges for HumDiv:
"probably damaging" ("D") for scores in [0.957, 1];
"possibly damaging" ("P") for scores in [0.453, 0.956];
"benign" ("B") for scores in [0, 0.452].
HumVar is intended for studies of Mendelian diseases, for which
mutations with drastic effects must be sorted out from abundant
mildly deleterious variants.
Predictions are derived from scores, with these ranges for HumDiv:
"probably damaging" ("D") for scores in [0.909, 1];
"possibly damaging" ("P") for scores in [0.447, 0.908];
"benign" ("B") for scores in [0, 0.446].
(Adzhubei et al., 2010)

MutationTaster applies a naive Bayes classifier
trained on a large dataset
(>390,000 known disease mutations from HGMD Professional and
>6,800,000 presumably harmless SNP and Indel polymorphisms
from the 1000 Genomes Project).
Variants that cause a premature stop codon resulting in
nonsense-mediated decay (NMD), as well as
variants marked as probable-pathogenic or pathogenic in
ClinVar,
are automatically presumed to be disease-causing ("A").
Variants with all three genotypes present in HapMap or with
at least 4 heterozygous genotypes in 1000 Genomes are automatically
presumed to be harmless polymorphisms ("P").
Variants not automatically determined to be disease-causing or polymorphic
are predicted to be "disease-causing" ("D") or
polymorphisms ("N") by the classifier.
Probability scores close to 1 indicate high "security" of
the prediction; probabilities close to 0 for an automatic prediction
("A" or "P")
can indicate that the classifier predicted a different outcome.
(Schwarz et al., 2010)

MutationAssessor
uses sequence homologs grouped into families and sub-families by
combinatorial entropy formalism to compute a Functional Impact Score
(FIS). It is intended for use in cancer studies, in which both gain
of function and loss of function are important; the authors also
identify a third category, "switch of function."
A prediction of "high" or "medium" indicates that the
variant probably has some functional impact, while "low" or
"neutral" indicate that the variant is probably function-neutral.
(Reva et al., 2011)

Transcript status

Some of the gene prediction tracks have additional annotations
to indicate the amount or quality of supporting evidence for each transcript.
When the track selected in the "Select Genes" section has such annotations,
these can be enabled under "Transcript Status". The options depend on which
gene prediction track is selected.

GENCODE tags:
when GENCODE Genes are selected in the "Select Genes" section,
any
GENCODE tags associated with a transcript can be added to output.

RefSeq status:
when RefSeq Genes are selected in the "Select Genes" section,
the transcript's
status can be included in output.

Canonical UCSC transcripts:
when UCSC Genes (labeled GENCODE V22 in hg38/GRCh38)
are selected in the "Select Genes" section,
the flag "CANONICAL=YES" is added when the transcript
has been chosen as "canonical" (see the "Related Data" section of the
UCSC Genes track description).

Known variation

If the selected genome assembly has a SNPs track (derived from
dbSNP),
when a variant has the same start and end coordinates as a variant in
dbSNP,
the VAI includes the reference SNP (rs#) identifier in the output.
Currently, the VAI does not compare alleles due to the frequency of strand
anomalies in dbSNP.

Conservation

If the selected genome assembly has a Conservation track with phyloP scores
and/or phastCons scores and conserved elements,
those can be included in the output.
Both phastCons and phyloP are part of the
PHAST package;
see the Conservation track description in the Genome Browser
for more details.

Filters

The volume of unrestricted output can be quite large,
making it difficult to identify variants of particular interest.
Several filters can be applied to keep only those variants
that have specific properties.

Functional role

By default, all variants are included in the output regardless of
predicted functional effect. If you would like to keep only variants
that have a particular type of effect, you can uncheck the checkboxes
of other effect types.
The detailed functional effect predictions are categorized as follows:

Known variation

(applicable only to assemblies that have "Common SNPs" and
"Mult SNPs" tracks)
By default, all variants appear in output regardless of overlap with known
dbSNP variants that map to multiple locations (a possible red flag),
or that have a global minor allele frequency (MAF) of 1% or higher.
Those categories of known variants can be used to exclude overlapping variants
from output by unchecking the corresponding checkbox.

Conservation

(applicable only to assemblies that have "Conservation" tracks)
If desired, output can be restricted to only those variants that overlap
conserved elements computed by phastCons.

Output format

Currently, the VAI produces output comparable to Ensembl's
Variant Effect Predictor (VEP), in either tab-separated
text format or HTML.
Columns are described
here.
When text output is selected, entering an output file name causes output to
be saved in a local file instead of appearing in the browser, optionally
compressed by gzip (compression reduces file size and network traffic,
which results in faster downloads).
When HTML is selected, output always appears in the browser window and the
output file name is ignored.

Acknowledgments

Anyone familiar with Ensembl's
Variant Effect Predictor (VEP) will doubtless notice
similarities in options and interface. In collaboration with our colleagues
at Ensembl, we have made an effort to limit the differences between the tools
by using Sequence Ontology terms to describe variants' functional effects and
by creating a "VEP" output format.
Any bugs in the VAI, however, are in the VAI only.