Silva reference files

Contents

Release 132

The SILVA alignment is 50,000 columns long so that it can be compatible with 18S rRNA sequences as well as archaeal 16S rRNA sequences. In our published opinion, this is the best reference alignment out there - far superior to the greengenes alignment. In a shift from our previous version of the SILVA references, we are providing the SEED database, the full-length sequences available from the NR SILVA database, and a SILVA aligned version of the gold database that is used for reference-based chimera detection. We have prepared a README document where you can read about the process that we used to generate these references.

Full length sequences and taxonomy references (188247 bacteria, 4626 archea, and 20246 eukarya sequences). This reference could be customized for alignments, but could also be used for classification. The uncompressed version is ~9.9 GB and the compressed version is 348 MB.

Recreated SEED database (8517 bacteria, 147 archaea, and 2516 eukarya sequences). The actual reference alignment that SILVA uses with their SINA aligner is called the SEED alignment. We don't know what this actually is. We have tried to duplicate it by identifying the unique sequences in the SSURef database (v123) that have a 100% quality score to the SEED alignment (field 'align_ident_slv' in the arb database) and that go from the end of the traditional 8f/27f primer to the beginning of the traditional 1492r primer. We are providing a composite dataset for bacterial, archaeal, and eukaryotic sequences. The uncompressed version is 534 MB and the compressed version is 19 MB.

Release 128

The SILVA alignment is 50,000 columns long so that it can be compatible with 18S rRNA sequences as well as archaeal 16S rRNA sequences. In our published opinion, this is the best reference alignment out there - far superior to the greengenes alignment. In a shift from our previous version of the SILVA references, we are providing the SEED database, the full-length sequences available from the NR SILVA database, and a SILVA aligned version of the gold database that is used for reference-based chimera detection. We have prepared a README document where you can read about the process that we used to generate these references.

Full length sequences and taxonomy references (168111 bacteria, 4337 archea, and 18213 eukarya sequences). This reference could be customized for alignments, but could also be used for classification. The uncompressed version is ~8.9 GB and the compressed version is 311 MB.

Recreated SEED database (8512 bacteria, 147 archaea, and 2554 eukarya sequences). The actual reference alignment that SILVA uses with their SINA aligner is called the SEED alignment. We don't know what this actually is. We have tried to duplicate it by identifying the unique sequences in the SSURef database (v123) that have a 100% quality score to the SEED alignment (field 'align_ident_slv' in the arb database) and that go from the end of the traditional 8f/27f primer to the beginning of the traditional 1492r primer. We are providing a composite dataset for bacterial, archaeal, and eukaryotic sequences. The uncompressed version is 536 MB and the compressed version is 19 MB.

Release 123

The SILVA alignment is 50,000 columns long so that it can be compatible with 18S rRNA sequences as well as archaeal 16S rRNA sequences. In our published opinion, this is the best reference alignment out there - far superior to the greengenes alignment. In a shift from our previous version of the SILVA references, we are now providing the SEED database, the full-length sequences available from the NR SILVA database, and a SILVA aligned version of the gold database that is used for reference-based chimera detection. We have prepared a README document where you can read about the process that we used to generate these references.

Full length sequences and taxonomy references (152308 bacteria, 3901 archea, and 16209 eukarya sequences). This reference could be customized for alignments, but could also be used for classification. The uncompressed version is ~7.2 GB and the compressed version is 249 MB.

Recreated SEED database (12083 bacteria, 294 archaea, and 2537 eukarya sequences). The actual reference alignment that SILVA uses with their SINA aligner is called the SEED alignment. We don't know what this actually is. We have tried to duplicate it by identifying the unique sequences in the SSURef database (v123) that have a 100% quality score to the SEED alignment (field 'align_ident_slv' in the arb database) and that go from the end of the traditional 8f/27f primer to the beginning of the traditional 1492r primer. We are providing a composite dataset for bacterial, archaeal, and eukaryotic sequences. The uncompressed version is ~700 MB and the compressed version is 25 MB.

Release 119

The SILVA alignment is 50,000 columns long so that it can be compatible with 18S rRNA sequences as well as archaeal 16S rRNA sequences. In our published opinion, this is the best reference alignment out there - far superior to the greengenes alignment. In a shift from our previous version of the SILVA references, we are now providing the SEED database, the full-length sequences available from the NR SILVA database, and a SILVA aligned version of the gold database that is used for reference-based chimera detection. We have prepared a README document where you can read about the process that we used to generate these references.

Full length sequences and taxonomy references (137879 bacteria, 3155 archaea, and 12273 eukarya sequences). This reference could be customized for alignments, but could also be used for classification. The uncompressed version is ~7.2 GB and the compressed version is 249 MB.

Recreated SEED database (12244 bacteria, 207 archaea, and 2558 eukarya sequences). The actual reference alignment that SILVA uses with their SINA aligner is called the SEED alignment. We don't know what this actually is. We have tried to duplicate it by identifying the unique sequences in the SSURef database (v119) that have a 100% quality score to the SEED alignment (field 'align_ident_slv' in the arb database) and that go from the end of the traditional 8f/27f primer to the beginning of the traditional 1492r primer. We are providing a composite dataset for bacterial, archaeal, and eukaryotic sequences. The uncompressed version is ~700 MB and the compressed version is 25 MB.

Release 102

The SILVA alignment is 50,000 columns long so that it can be compatible with 18S rRNA sequences as well as archaeal 16S rRNA sequences. The actual reference alignment that SILVA uses with their SINA aligner is called the SEED alignment. We don't know what this actually is. We have tried to duplicate it by identifying the unique sequences in the SSURef database (v102) that have a 100% quality score to the SEED alignment and that go from the end of the traditional 8f/27f primer to the beginning of the traditional 1492r primer. We are providing separate datasets for bacterial, archaeal, and eukaryotic sequences. Within each reference set are the aligned sequence file (e.g. silva.bacteria.fasta), an unaligned sequence file (e.g. nogap.bacteria.fasta), and taxonomy outlines (e.g. silva.bacteria.silva.tax):