Contents

This project allows participants to access phage gene databases and participate in phage genomics research projects. The inspiration for this project was a recently published article reporting phage genome results obtained by high school and college students.

A few viruses that infect bacteria have been carefully studied as laboratory model systems, but bacterial virus are very diverse and as a group they have many proteins with unknown functions. Much remains to be discovered about the diversity, evolution and ecological impact of phage.

Bacteriophages are viruses that infect bacterial hosts and are estimated to be the most numerous biological entities in the biosphere. Insights into the genetic diversity of the bacteriophage population and the evolutionary mechanisms that give rise to it can be obtained using comparative genomic analyses. The genomic analysis of 30 complete mycobacteriophages—viruses that infect mycobacterial hosts—reveals them to be genetically diverse and to contain many previously unidentified genes. The high diversity and relatively small genome sizes of these phages provide an ideal platform for introducing high school and undergraduate students to the research laboratory, isolating and naming novel viruses, and determining their genomic sequences. The thrill of discovering new viruses and previously unidentified genes, coupled with ownership of individual phage projects, provides strong motivations for students to engage in and pursue scientific research.[1]

GenBank - Search the GenBank database for the accession number for one of the phages from the Hatfull article such as phage L5 (number Z18946). Select "Genome" from the drop-down menu and enter Z18946 in the search field (Figure 2).

Click on the NC_001335 link and you should reach a page for "Mycobacterium phage L5, complete genome". Near the bottom of that page there is a map of the L5 phage genome. In the map, if you click on the arrow representing the first predicted protein (L5p01) you will go to a page that shows the amino acid sequence for that protein.

The Basic Local Alignment Search Tool allows you to search for similarities between sequences. To search for sequence similarities to the Mycobacterium phage L5 genome, click on the "GenBank: Z18946" link in the GenBank summary page for phage L5 (see Figure 3). You should be taken to a new page with sequence information for the L5 phage. Near the top of that page, fond the GenBank identification number, GI:15859. You can use the number (15859) with the BLAST tool to search for other known sequences that are similar to the L5 genomic sequence.

Figure 4. Select nucleotide-nucleotide BLAST.

In the main BLAST page, select "Nucleotide-nucleotide BLAST" from the list of available search tools (Figure 4). On the BLAST search page, type "15859" into the search box and click the "BLAST!" button. You should then see a page that says "Your request has been successfully submitted and put into the Blast Queue." and something like "The results are estimated to be ready in 9 seconds but may be done sooner." Wait the estimated time and then click the "Format!" button.

When the BLAST search is done, you should see a graphical summary of sequence similarity results (Figure 5). For the example shown, the mouse cursor was over the sequence with greatest similarity to phage L5, another phage called Che12. If you click on part of the graphical representation of the Che12 sequence, you will move down the page where detailed sequence alignments are shown and it says "3150/3562 (88%)" indicating that for 3562 nucleotides of similar sequences in L5 and Che12 there is 88% sequence identity.

Just below the graphical summary there should be a link called, "Distance tree of results". Clicking on that link should show a tree that indicates the types of sequences with similarity to the L5 phage (see Figure 6, below).

Figure 5. BLAST search result.

In the distance tree, some of the sequence matches are to artificial vector sequences. Other significant sequence matches are for other viruses (including Che12, D29, FRAT1, Bxz2, see Figure 1, above) and bacteria. For example, the Mycobacterium "MCS" has two short regions of sequence similarity to the L5 phage sequence. These sequence similarities between viruses and bacteria suggest that there have been recent genetic recombinations between bacterial genomes and these phage genomes.

Note that most of the viruses in this study were isolated according to their ability to infect Mycobacterium smegmatis mc2155. The only exception was L5 which was isolated after it infected Mycobacterium tuberculosis.