Title

Department

Authors

Document Type

Thesis

Abstract

Over 6,869 Mycobacteriophages have been isolated and purified. Of these, 1,367 genomes have been sequenced at the DNA level and more are added each year through the SEA-PHAGES program. Sequenced mycobacteriophages are grouped into clusters based on a 50% or greater nucleotide identity. The number and breadth of these clusters represents the diversity present in the environment. Each year, as new phages are discovered by students in the SEA-PHAGES program, the question arises, “Which isolates should we sequence?” In order to sequence phages that represent the greatest possible diversity, and thus broaden under-represented clusters and identify new singletons, we need a rapid way to identify phage cluster membership or singleton status before selection for DNA sequencing. One approach is to identify unique short nucleotide sequences that are common across a cluster. Unique sequences could then be used as primers or probes to assign membership to a cluster or potential singleton group. A computer program called PhageUniqueSeq was written in Go language to identify all the oligonucleotides that are common to all members of a cluster but unique between clusters. The program generated millions of unique sequences that can be used as probes or in Polymerase Chain Reactions to determine sub-cluster assignment. Unique sequences will help us to target underrepresented phages for sequence analysis.