About Kraken 2

Kraken 2 is the newest version of Kraken, a taxonomic classification system
using exact k-mer matches to achieve high accuracy and fast classification speeds.
This classifier matches each k-mer within a query sequence to the lowest
common ancestor (LCA) of all genomes containing the given k-mer.
The k-mer assignments inform the classification algorithm.
[see: Kraken 1's Webpage for more details].

Kraken 2 provides significant improvements to Kraken 1, with faster database build times, smaller database sizes, and faster classification speeds. These improvements were achieved by the following updates to the Kraken classification program:

Storage of Minimizers:
Instead of storing/querying entire k-mers,
Kraken 2 stores minimizers (l-mers) of each k-mer.
The length of each l-mer must
be ≤ the k-mer length. Each k-mer is treated by Kraken 2 as if its LCA
is the same as its minimizer's
LCA.

Database Structure: While Kraken 1 saved an indexed and sorted list of
k-mer/LCA pairs, Kraken 2 uses a compact hash table. This hash table is
a probabilistic data structure that allows for faster queries and lower
memory requirements. However, this data structure does have a <1% chance
of returning the incorrect LCA or returning an LCA for a non-inserted minimizer.
Users can compensate for this possibility by using Kraken's confidence
scoring thresholds.

Protein Databases: Kraken 2 allows for databases built from amino
acid sequences. When queried, Kraken 2 performs a six-frame translated
search of the query sequences against the database.

16S Databases: Kraken 2 also provides support for databases not
based on NCBI's taxonomy. Currently, these include the 16S databases:
Greengenes, SILVA, and RDP.

News

11/01/2018 - MiniKraken Released for Kraken 2 MiniKraken databases released to the public. See Downloads and Links for more details.

08/11/2018 - v2.0.7-beta releaseSupport for Minikraken building based on database size added. Other updates/fixes. See CHANGELOG.md for details

06/26/2018 - v2.0.6-beta releaseInitial public release of Kraken 2.

Publications

The Kraken 2 paper is currently under preparation.
Until it is released, please cite the original Kraken paper
when using Kraken 2 in your research:

Users with low-memory computing environments may be unable to load a full Kraken standard library (~30GB as of 09/2018) into RAM. Therefore, we provide here two MiniKraken2 databases that require only 8GB of RAM for classification. The databases were built by using the --max-db-size option, which downsamples the minimizers in the standard Kraken 2 database using a hash function.

The challenge with downsampling to create the minimized databases is maintaining sensitivity. To address this concern, we measured accuracy with two datasets (HiSeq and MiSeq) from the original Kraken publication that consist of single-ended microbial isolate reads mixed in equal proportions. Below are the results for Kraken 1 and Kraken 2 using the original standard databases and the 8GB MiniKraken databases:

Kraken 2 and Other Tools

The following tools are compatible with both Kraken 1 and Kraken 2. Both tools are designed to assist users in analyzing and visualizing Kraken results.