Camp wildness 2005 On-line sequence data analysis

CAMP WILDNESS 2005 On-line sequence data analysisThe purpose of this exercise is to introduce you to a few web tools for analysis of rRNA and DNA sequence data. I will demonstrate these tools and you will then have an opportunity to try them out yourselves. This will prepare you for analysis of your own 16S rRNA sequence data if we decide to obtain it from your DGGE bands. Using the test sequence below you should obtain the results that are italicized below. You will each be given your own unknown sequences in class.

Ribosomal Database Project (RDP, http://35.8.164.52/html/). This website contains many thousands of rRNA sequences and provide a number of subroutines for comparative analysis of them. It is also a handy reference for determining the phylogenetic relationships among microorganisms. It is out of date with respect to other major sequence repositories. We will use it to

Identify the closest relative of an unknown 16S rRNA sequence that will be provided to you.

Locate “Sequence Match” for small subunit rRNA in the table and click on “run”.

Either copy and paste your sequence data into the box provided. For now, use the text sequence at the end of this document. At the bottom of the page, select “Submit Sequences”

The result will be a list, possibly showing the hierarchical phylogeny, of the closest relative in the database.

Example: The test sequence was found to be identical to the SSU rRNA sequence of Nostoc muscorum, a member of the Nostoc group of Cyanobacteria, within the kingdom containing cyanobacteria and chloroplasts (prokaryotic oxygenic phototrophs) of Domain Bacteria.

BACTERIA

CYANOBACTERIA_AND_CHLOROPLASTS

CYANOBACTERIA

NOSTOC_GROUP

Cyls.7417

0.878

1319

Cylindrospermum sp. PCC 7417

Nost.muscr

1.000

1412

Nostoc muscorum PCC 7120

AB016520

0.993

1369

"Anabaena variabilis" IAM M-3

Anbn.cyli2

0.861

1422

"Anabaena cylindrica" str. NIES19 PCC 7122

Nost.punct

0.837

1322

Nostoc punctiforme PCC 73102

AF062637

0.837

1378

Nostoc GSV224 str. GSV224

AF062638

0.844

1391

Nostoc ATCC53789 ATCC 53789

AF027653

0.842

1322

Nostoc TDI#AR94 str. TDI#AR94

Note that by clicking on the organism name, you can obtain the primary sequence data and also information about the reference that reported the sequence.

Select an unknown SSU rRNA sequence, which is available on the class website. Report in the space below the closest relative in the RDP database for the unknown sequence and also the fractional relatedness value to this sequence.

Unknown SSU rRNA sequence number:

Closest RDP relative to unknown sequence:

Fractional similarity score:

Hierarchical phylogeny of unknown sequence:

BLAST search (http://www.ncbi.nlm.nih.gov/BLAST/). This website allows you to compare an unknown sequence to very large and up-to-date gene sequence databases to find the closest relative. There are many options. We will use “blastn”, which rapidly compares a DNA sequence to other DNA sequences.

Open the website

Under “Nucleotide” select “Nucleotide-nucleotide BLAST (blastn)”

Copy and paste the test SSU rRNA sequence into the window and click on “BLAST NOW”

Click on the top hit and record information about the gene and the organism (and habitat, if possible) from which it came (e.g., 16S rRNA sequence from Anabaena sp.; N.B. Nostoc muscorum is on the list a bit further down)

Enter the genus and species name of closest relative from the BLAST search

Click on “Search”

The output is a hierarchical display of the phylogenetic “neighborhood” of the organism. Record the Domain and first Subdomain level of the organisms phylogeny.

Example: If you search for “Sulfolobus acidocaldarius” you should find that it belongs to a subgroup of Domain Archaea, Kingdom Crenarchaeota.

Report the following information for your unknown sequence below:

Closest BLAST relative to unknown sequence:

Percent similarity to closest relative:

Hierarchical phylogeny:

Information you can glean from clicking on the closest relative (e.g., habitat):

The Institute for Genome Research (TIGR, www.tigr.org). This is one of the primary websites for genomic sequences and their analysis. We will do a couple of simple exercises to demonstrate the breadth and depth of this database.