1Pittsburgh Bacteriophage Institute and Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.

Abstract

Five newly isolated mycobacteriophages--Angelica, CrimD, Adephagia, Anaya, and Pixie--have similar genomic architectures to mycobacteriophage TM4, a previously characterized phage that is widely used in mycobacterial genetics. The nucleotide sequence similarities warrant grouping these into Cluster K, with subdivision into three subclusters: K1, K2, and K3. Although the overall genome architectures of these phages are similar, TM4 appears to have lost at least two segments of its genome, a central region containing the integration apparatus, and a segment at the right end. This suggests that TM4 is a recent derivative of a temperate parent, resolving a long-standing conundrum about its biology, in that it was reportedly recovered from a lysogenic strain of Mycobacterium avium, but it is not capable of forming lysogens in any mycobacterial host. Like TM4, all of the Cluster K phages infect both fast- and slow-growing mycobacteria, and all of them--with the exception of TM4--form stable lysogens in both Mycobacterium smegmatis and Mycobacterium tuberculosis; immunity assays show that all five of these phages share the same immune specificity. TM4 infects these lysogens suggesting that it was either derived from a heteroimmune temperate parent or that it has acquired a virulent phenotype. We have also characterized a widely-used conditionally replicating derivative of TM4 and identified mutations conferring the temperature-sensitive phenotype. All of the Cluster K phages contain a series of well conserved 13 bp repeats associated with the translation initiation sites of a subset of the genes; approximately one half of these contain an additional sequence feature composed of imperfectly conserved 17 bp inverted repeats separated by a variable spacer. The K1 phages integrate into the host tmRNA and the Cluster K phages represent potential new tools for the genetics of M. tuberculosis and related species.

A. Nucleotide sequences of Cluster K genomes were concatenated and compared to themselves and each other using the dotplot generator Gepard [75]. Phages Adephagia, Anaya, Angelica, and CrimD show extensive nucleotide identity to each other while TM4 and Pixie are less similar, supporting division into Subclusters K1, K2 and K3 as shown. B. Average nucleotide identities of Cluster K mycobacteriophages.

Particles of Cluster K phages were put on 400 mesh coated copper grids and stained with 1% uranyl acetate. Virions were imaged using a Morgagni transmission electron microscope. All the cluster K phages exhibit a flexible non-contractile tailed morphology with short side tail fibers. Virion capsids are ∼55 nm in diameter and tails average ∼190 nm in length.

A database of 83 sequenced mycobacteriophages (Mycobacteriophage_83) was analyzed using the program Phamerator (S. Cresawn, RHW & GFH, manuscript submitted) and used to compare the genome organizations of the six Cluster K phages. The top four genomes (Anaya, Adephagia, Angelica and CrimD) constitute Subcluster K1 and their overall nucleotide similarities are reflected by the violet shading between the genomes (nucleotide similarities between adjacent genomes are spectrum color-coded with violet being the most similar, and red the least similar). TM4 and Pixie belong to Subcluster K2 and K3 and their more distant relationships are evident. Each of the genes (boxes above or below each genome) are colored according to their phamily designation and the shared genome organizations of all six phages is consistent with their grouping into Cluster K.

A map of the TM4 genome was revised from that reported previously [40]. The genome is shown with markers spaced at 100 bp intervals, with genes shown as colored boxes, either above (rightwards transcribed) or below (leftwards transcribed) the genome. Gene names are shown within the boxes, and the phamily number of that gene shown above with the number of phamily members in parentheses. Genes are colored according to their phamily, and white genes represent orphams (phams with only a single member). Genes 92, 93, and 94 are newly assigned, transcribed in the reverse direction from the rest of the genes of the genome, and replace gene 41 in the original TM4 annotation (see figure 11A). Putative gene functions are indicated. Also shown is a segment that is deleted in construction of shuttle phasmid phAE159 [23], and the locations of mutations (purple asterisks) and PCR amplicons (purple bars) used in their analysis. Vertical arrows with numbers show the positions of Start-Associated Sequences (SAS), either with (ESAS; red arrows) or without (black arrows) extended SAS sequences (see Figs. 14 and 15). SAS and ESAS sites are numbered as in Fig. 14 and are all in one orientation unless indicated otherwise with a minus sign.

A phamily circle of Pham1847 is shown with each of the 83 phages around the circumference of the circle and arcs drawn between phages that contain a member of Pham1847; BLASTP values shown as blue lines and ClustalW similarities as red lines.

Comparison of the TM4 genome to the other Cluster K genomes reveals two segments that appear to have been lost from TM4, and which may contribute to its non-temperate phenotype. A. The central parts of the CrimD, TM4, and Pixie genomes are aligned, with the colored shading reflecting the presence of genes of shared phamilies (i.e. homologues; note this shading does not reflect nucleotide sequence similarity as in Fig. 5). Although nucleotide sequence similarity is minimal, the alignment of shared genes suggests the loss of about 3.5 kbp form TM4 compared to its relatives. B. Alignment of the right ends of the CrimD, TM4 and Pixie genomes suggesting loss of ∼3.3 kbp from TM4 compared to its relatives; shading is as described for A.

A. Organization of the attP site of phage CrimD. The attP core region with sequence similarity to the chromosomal attB site is indicated, with predicted integrase-binding arm-type sites (P1, P2, P3, and P4) shown flanking it. The predicted orfs for integrase (41) and gene 40 are shown (see Fig. S2 for genomic context). A putative rightwards stem-loop terminator is located between the core and P3 sites; the attP sites for the other K1 phages are organized similarly. B. Organization of the attP site of phage Pixie, annotated as above. The K3 phage Pixie uses an integrase more distantly-related to those encoded by K1 phages, has a different attP core, and integrates into a different attB site. C. Alignment of the putative arm-type sites of Cluster K phages and the consensus sequences derived from them. Consensus positions shown in bold indicate differences between the K1 phages (CrimD, Angelica, Anaya, and Adephagia). D. Alignment of the N-terminal regions of the Subcluster K1 integrases that are predicted to recognize the arm-type sites.

Repeated sequences were identified in Cluster K phages through BLASTN comparisons with other mycobacteriophages, followed by scanning for the presence of the sequence 5′-GGGATAGGAGCCC, allowing for up to two deviations from the scanned sequence. (Pixie site #19 has three departures from the consensus but is included in the list because it is associated with an Extended SAS, see Fig. 15). Sites for Angelica, Adephagia, and CrimD are shown in Fig. S3. The sequence is asymmetric and most copies are orientated in one direction as indicated. With the rare exception of those sites in the opposite orientation (e.g. Anaya site #17), all are immediately upstream of gene start sites (Anaya site #7 is a notable exception.) The gene immediately downstream is listed along with its phamily (Pham) designation; the putative translation initiation codons are underlined; where the termination codon of the upstream gene lies within the conserved sequence it is italicized. The consensus sequence is shown in bold and the positions of the sites are shown by the colored highlighting. The extreme 3′ end of the 16S rRNA is shown with bases predicted to contribute to pairing with mRNA shown in bold. The genomic locations of these SASs are shown by numbered vertical arrows in Figs. 6, 7, 8, 9, S1 and S2).

A subset of the SASs shown in Fig. 14 also contain a conserved sequence immediately upstream of the SAS (red arrows in Figs. 6, 7, 8, 9, S1, S2). These sequences contain a 17 bp imperfect inverted repeat separated by a variable spacer. A. Alignment of the extended SAS sequences for Anaya, Pixie, and TM4. The consensus sequence is shown with bases in upper case if there are two or fewer departures from the consensus, and in lower case if there are more than two departures but a greater than 50% agreement. B. Consensus sequences for each of the half sites within the extended SAS sequences. Upper case letters denote no more than four deviations from the consensus. Positions conserved 50% or more are shown in lower case letters. The SASs are indicated with the colored boxes and the putative start codon is underlined. The downstream gene is shown in italic type and its phamily designation is shown in parentheses. ESAS sequences for phages Adephagia, Angelica and CrimD are shown in Fig. S4; their 17 bp consensus sequences are very similar to their fellow Subcluster K1 page Anaya. A comparison of which phages genes are associated with SAS and ESAS sequences is shown in Table 2.

Phage ph101 is a temperate-sensitive conditionally replicating derivative of TM4 [20], and differs by more than 20 base substitutions, ten of which confer amino acid changes in predicted gene products as indicated. Five independent revertants capable of growth at 37°C were isolated, and the alleles at these ten positions were determined by sequencing of PCR amplicons (see Fig. 6). Four (D, F, G, and J) contain identical base changes that revert the mutations in genes 48 and 66 back to wild-type. One mutant (C) also contains the wild-type gene 66 sequence, but has a presumed intragenic suppressor mutation in gene 48. Genes 48 and 66 thus contribute to the conditionally replicating phenotype. All of the revertants plate with reduced efficiency of plating at 42°C, and revertants growing normally at 42°C have an additional mutation reverting to the wild-type sequence in the putative tail gene, 20.