Description

This track shows the DNA sequences targetable by CRISPR RNA guides using
the Cas9 enzyme from S. pyogenes (PAM: NGG) over the entire
human (hg38) genome. CRISPR target sites were annotated with
predicted specificity (off-target effects) and predicted efficiency
(on-target cleavage) by various
algorithms through the tool CRISPOR. Sp-Cas9 usually cuts double-stranded DNA three or
four base pairs 5' of the PAM site.

Display Conventions and Configuration

The track "CRISPR Targets" shows all potential -NGG target sites across the genome.
The target sequence of the guide is shown with a thick (exon) bar. The PAM
motif match (NGG) is shown with a thinner bar. Guides
are colored to reflect both predicted specificity and efficiency. Specificity
reflects the "uniqueness" of a 20mer sequence in the genome; the less unique a
sequence is, the more likely it is to cleave other locations of the genome
(off-target effects). Efficiency is the frequency of cleavage at the target
site (on-target efficiency).

Shades of gray stand for sites that are hard to target specifically, as the
20mer is not very unique in the genome:

impossible to target: target site has at least one identical copy in the genome and was not scored

hard to target: many similar sequences in the genome that alignment stopped, repeat?

hard to target: target site was aligned but results in a low specificity score <= 50 (see below)

Colors highlight targets that are specific in the genome (MIT specificity > 50) but have different predicted efficiencies:

Mouse-over a target site to show predicted specificity and efficiency scores:

The MIT Specificity score summarizes all off-targets into a single number from
0-100. The higher the number, the fewer off-target effects are expected. We
recommend guides with an MIT specificity > 50.

The efficiency score tries to predict if a guide leads to rather strong or
weak cleavage. According to (Haeussler et al. 2016), the Doench
2016 Efficiency score should be used to select the guide with the highest
cleavage efficiency when expressing guides from RNA PolIII Promoters such as
U6. Scores are given as percentiles, e.g. "70%" means that 70% of mammalian
guides have a score equal or lower than this guide. The raw score number is
also shown in parentheses after the percentile.

The Moreno-Mateos 2015 Efficiency
score should be used instead of the Doench 2016 score when transcribing the
guide in vitro with a T7 promoter, e.g. for injections in mouse, zebrafish or
Xenopus embryos. The Moreno-Mateos score is given in percentiles and the raw value in parentheses, see the note above.

Click onto features to show all scores and predicted off-targets with up to
four mismatches. The Out-of-Frame score by Bae et al. 2014
is correlated with
the probability that mutations induced by the guide RNA will disrupt the open
reading frame. The authors recommend out-of-frame scores > 66 to create
knock-outs with a single guide efficiently.

Off-target sites are sorted by the CFD score (Doench et al. 2016).
The higher the CFD score, the more likely there is off-target cleavage at that site.
Off-targets with a CFD score < 0.023 are not shown on this page, but are available when
following the link to the external CRISPOR tool.
When compared against experimentally validated off-targets by
Haeussler et al. 2016, the large majority of predicted
off-targets with CFD scores < 0.023 were false-positives. For storage and performance
reasons, on the level of individual off-targets, only CFD scores are available.

Methods

Relationship between predictions and experimental data

Like most algorithms, the MIT specificity score is not always a perfect
predictor of off-target effects. Despite low scores, many tested guides
caused few and/or weak off-target cleavage when tested with whole-genome assays
(Figure 2 from Haeussler
et al. 2016), as shown below, and the published data contains few data points
with high specificity scores. Overall though, the assays showed that the higher
the specificity score, the lower the off-target effects.

Similarly, efficiency scoring is not very accurate: guides with low
scores can be efficient and vice versa. As a general rule, however, the higher
the score, the less likely that a guide is very inefficient. The
following histograms illustrate, for each type of score, how the share of
inefficient guides drops with increasing efficiency scores:

When reading this plot, keep in mind that both scores were evaluated on
their own training data. Especially for the Moreno-Mateos score, the
results are too optimistic, due to overfitting. When evaluated on independent
datasets, the correlation of the prediction with other assays was around 25%
lower, see Haeussler et al. 2016. At the time of
writing, there is no independent dataset available yet to determine the
Moreno-Mateos accuracy for each score percentile range.

Track methods

The entire human (hg38) genome was scanned for the -NGG motif. Flanking 20mer
guide sequences were
aligned to the genome with BWA and scored with MIT Specificity scores using the
command-line version of crispor.org. Non-unique guide sequences were skipped.
Flanking sequences were extracted from the genome and input for Crispor
efficiency scoring, available from the Crispor downloads page, which
includes the Doench 2016, Moreno-Mateos 2015 and Bae
2014 algorithms, among others.

Note that the Doench 2016 scores were updated by
the Broad institute in 2017 ("Azimuth" update). As a result, earlier versions of
the track show the old Doench 2016 scores and this version of the track shows new
Doench 2016 scores. Old and new scores are almost identical, they are
correlated to 0.99 and for more than 80% of the guides the difference is below 0.02.
However, for very few guides, the difference can be bigger. In case of doubt, we recommend
the new scores. Crispor.org can display both scores and many more with the
"Show all scores" link.

Data Access

Positional data can be explored interactively with the
Table Browser.
For small programmatic positional queries, the track can be accessed using our
REST API. For genome-wide data or
automated analysis, CRISPR genome annotations can be downloaded from
our download server
as a bigBedFile.

The files for this track are called crispr.bb, which lists positions and
scores, and crisprDetails.tab, which has information about off-target matches. Individual
regions or whole genome annotations can be obtained using our tool bigBedToBed,
which can be compiled from the source code or downloaded as a pre-compiled
binary for your system. Instructions for downloading source code and binaries can be found
here. The tool
can also be used to obtain only features within a given range, e.g.