Related Articles

Abstract

Genomes of RNA viruses contain multiple functional RNA elements required for translation or RNA replication. We use unique approaches to identify functional RNA elements in the coding sequence of poliovirus (PV), a plus strand RNA virus. The general method is to recode large segments of the genome using synonymous codons, such that protein sequences, codon use, and codon pair bias are conserved but the nucleic acid sequence is changed. Such recoding does not affect the growth of PV unless it destroys the sequence/structure of a functional RNA element. Using genetic analyses and a method called “signal location search,” we detected two unique functionally redundant RNA elements (α and β), each about 75 nt long and separated by 150 nt, in the 3′-terminal coding sequence of RNA polymerase, 3Dpol. The presence of wild type (WT) α or β was sufficient for the optimal growth of PV, but the alteration of both segments in the same virus yielded very low titers and tiny plaques. The nucleotide sequences and predicted RNA structures of α and β have no apparent resemblance to each other. In α, we narrowed down the functional domain to a 48-nt-long, highly conserved segment. The primary determinant of function in β is a stable and highly conserved hairpin. Reporter constructs showed that the α- and β-segments are required for RNA replication. Recoding offers a unique and effective method to search for unknown functional RNA elements in coding sequences of RNA viruses, particularly if the signals are redundant in function.

Cis-acting RNA elements in the genomes of RNA viruses are important in many key processes of the viral life cycle. Some of these RNA elements depend largely on sequence, whereas others depend largely on structure. Most known RNA elements are located in the 5′- and 3′-nontranslated regions (NTRs) of the viral genomes; however, occasionally, they are also located in protein coding sequences (1). In the past, bioinformatic methods were the primary means of predicting the existence of functionally important sequences or structures, which are phylogenetically conserved and frequently thermodynamically favored. In the current study, a computer-generated design combined with chemical de novo genome synthesis was used for the identification of unique, redundant RNA structures in the genome of poliovirus (PV), a member of the Enterovirus genus of Picornaviridae, not previously reported in other RNA virus genomes.

The plus strand genome of PV (7.5 kb) contains multiple functional RNA elements in the NTRs of the genome that are required for translation or replication. The 5′-NTR contains two functional domains (Fig. 1A). The first domain includes the cloverleaf and the adjacent C-rich spacer, which are required for RNA replication (2⇓–4). The second domain consists of a large internal ribosomal entry site element that promotes translation of the polyprotein (1, 5⇓–7) (Fig. 1A). The two hairpins contained within the 3′-NTR and the adjacent poly(A) tail are involved in RNA replication, but their exact role is not yet known (8, 9). So far, only a single cis-replicating RNA element, cre(2C), has been discovered in the coding sequences of the PV polyprotein. This small hairpin, which was identified by phylogenetic analyses and bioinformatic methods, is located in the 5′-terminal half of the PV 2CATPase coding sequence (10). Cre elements occur in all picornavirus genomes, often in different locations; the first was described in the genome of human rhinovirus 14 (11). Subsequent studies of the PV cre(2C) have shown that it serves as a template for the uridylylation of the 5′-terminal protein VPg by the RNA polymerase 3Dpol (12, 13). The existence of another highly conserved and stable RNA hairpin in the coding sequences of enterovirus 3Dpol was predicted about 10 y ago by Witwer et al. (14). At the time, we carried out a mutational analysis of this structure/sequence in PV RNA, but, surprisingly, the mutations had no detrimental effect on viral growth.

Growth properties of polioviruses containing SD in the P1, P2, and P3 domains of the polyprotein. (A) Genomic structure of PV. The genome consists of a long 5′-NTR, a single ORF (polyprotein), a short 3′-NTR, and a poly(A) tail. The polyprotein contains three domains, one structural (P1) and two nonstructural (P2 and P3). The mature proteins of the P3 domain are indicated. The cre structure in P2 is absolutely required for replication. The location of the Bgl II restriction site used for subcloning is shown. AA, poly(A) tail; IRES, internal ribosomal entry site. (B) SD of the 3CDpro domain. The 3CDpro coding sequence, lacking 163 nt at the 5′-terminus of 3C, was divided into two parts using an Afl II restriction site: Δ3C + 152 nt of 3D ([Δ3C/5′-3D]SD) and the remainder of 3D (Δ3Da). (C) Growth phenotypes of SD polioviruses. Viruses containing chemically synthesized SD segments were characterized by determining the time of CPE, virus titer, and plaque size (Materials and Methods). The nucleotide numbers shown in parentheses in the “constructs” column mean the SD region. The term “0” passage means full CPE after transfection of transcripts into cells. (D) Growth phenotypes of viruses containing the SD segments within the 3CDpro domain are characterized.

The degenerate nature of the genetic code allows a very large number of encodings of any particular protein sequence (15). In naturally occurring genes, the extraordinarily large number of possible encodings is somewhat restricted by two encoding biases called the codon bias (16, 17) and the codon pair bias (18). We used an algorithmic design called a “scrambled design” (SD), which introduced the largest possible number of nucleotide changes by shuffling the positions of existing synonymous codons without altering the encoded amino acid sequence or codon bias, or significantly altering the codon pair bias (17). An SD is expected to mutate most RNA signals longer than 3–4 nt. The WT encoding of the PV capsid domain (P1) (Fig. 1A) was replaced with an SD following the general strategy of Mueller et al. (17) and Cello et al. (19). Although this changed 930 nt of 2,628 nt, it did not interfere with viral growth (17). This was not unexpected, because the capsid domain can be deleted or replaced by other coding sequences with no effect on RNA replication (20, 21), showing that it does not contain essential cis-acting RNA signals. The P1SD experiment confirmed, however, that scrambling of P1 had little, if any, effect on PV genome function (17).

In the current study we have reengineered the P2 and P3 domains of the PV polyprotein using an SD with the aim of discovering previously undescribed functional RNA sequences or structure(s). An SD of P2 should lead to a lethal phenotype because it is expected to destroy the essential cre(2C) element (Fig. 1A); thus, this experiment was a proof of principle for our strategy. Surprisingly, an SD of P3 also inactivated viral replication. Combining genetic analyses with a recently developed signal location search (22, 23), we discovered two unique, functionally redundant RNA elements, α and β, each about 75 nt long and separated by 150 nt, in the 3′-terminal coding sequence of PV RNA polymerase 3Dpol. The presence of either α or β in its WT form was sufficient for the optimal growth of PV, but scrambling both segments yielded very low titers and tiny plaques. In α, we narrowed down the functional domain to a 48-nt segment that is highly conserved only in C-cluster enteroviruses [closely related enteroviruses (e.g., polioviruses, C-cluster coxsackie A viruses) of Picornaviridae]. It is possible, however, that an equivalent α-element exists in all enterovirus genomes at a position different from that in PV. The primary determinant of function in β is a stable hairpin (37 nt) that is highly conserved in all clusters of the genus Enterovirus (14). Using a luciferase reporter, we were able to correlate the defect in SD(α + β) clearly with a striking but as yet unknown deficiency in genome replication.

Results and Discussion

Search for RNA Signals in the P2 and P3 Domains of the PV Polyprotein Using SD.

As expected, an SD in the P1 domain of the PV polyprotein had no effect on viral growth (17, 18) (Fig. 1C, row 2). Domains P2 and P3 were divided into two segments at a convenient Bgl II restriction site (Fig. 1A), yielding segments P2/5′-P3 and ΔP3. The first segment contained P2 and a 490-nt 5′-terminal fragment of P3 (P2/5′-P3), and the second segment contained the remainder of P3 (ΔP3). The corresponding computer-designed SD fragments of [P2/5′-P3]SD and ΔP3SD, containing changes at about every third nucleotide (Fig. 1C), were chemically synthesized and used to replace separately the WT genome sequences. In vitro transcribed RNAs of these variants were transfected into HeLa cells. If cytopathic effect (CPE) evolved, viruses were titered by plaque assay (Fig. 1C). Failure to observe CPE for 2–3 d led to eight blind passages on fresh HeLa cells.

When P2/5′-P3 was analyzed as an SD segment, the resulting genome was dead (Fig. 1C, row 3). Knowing about the existence and sequence of cre(2C), we recovered viable virus on rebuilding WT cre(2C) within the [P2/5′-P3]SD segment (Fig. 1C, row 4). Thus, our strategy successfully identified cre as the only essential RNA element in this large region of the PV genome. A slight reduction in the plaque size of [P2Cre/5′-P3]SD compared with the WT is most likely due to some abnormal interaction between the WT cre and surrounding SD sequences (Fig. S1).

When the ΔP3 segment was scrambled, the genome again was dead (Fig. 1C, row 5), an observation suggesting that this segment also contains a functional RNA sequence/structure. However, unlike the case with the P2 segment, there were no previously known cis-acting elements in P3.

Search for a Functional RNA Element in the ΔP3-Coding Domain of the Polyprotein.

To narrow down the location of a functional RNA element in ΔP3, we divided ΔP3 into two parts using an Afl II restriction site. The first segment [Δ3C/5′-3D] contained most of the 3C coding sequence plus the 5′ part of the 3Dpol coding sequence (152 nt) (Fig. 1 B and D), whereas the second segment consisted of the remainder of the 3Dpol coding sequences (Δ3Da, 1,230 nt). Constructs containing [Δ3C/5′-3D]SD encodings replicated with WT kinetics, whereas those with Δ3DaSD or Δ3DcSD (375 nt) (Fig. 1D) were either dead (Δ3DaSD) or highly defective in growth (Δ3DcSD, low titer and tiny plaques; Fig. 1D and Fig. S1). Indeed, virus could be detected only after expansion (2–3 passages) on HeLa cells. From these results, we concluded that an important RNA element was located within the last 375 nt of the 3Dpol coding sequence (Fig. 1D, row 4). We speculate that the slightly smaller plaque size of Δ3DbSD than that of the WT virus (Fig. 1D, row 3) is likely due to detrimental interactions between the neighboring WT and SD sequences rather than to any damage to an RNA element.

To refine the location of the functional RNA element in the C-terminal domain of the polyprotein, a method we have called signal location search (22, 23) was used. This method was designed for the identification of a single functional element, which we initially assumed would be the case in 3Dpol. In signal location search, multiple different synthetic viruses are designed, each virus with different, complementary patterns of WT and SD segments (Fig. 2). We assumed that the pattern of viability or inviability could be attributed to a particular segment. We designed and synthesized four different versions of the C-terminal 1,230 nt of the 3Dpol coding sequence, slicing each into 24 = 16 segments, where each segment could be either WT (W) or scrambled (S). As shown in Fig. 2, each of the 16 vertical columns contains a unique signature of W or S. In principle, a particular pattern of viability or inviability should be traceable to one of the 16 segments.

Signal location search using four SDs. Most of the 3Dpol-coding sequence (6,140–7,369 nt) was divided into 16 segments with W and S sequences alternating in different combinations. The first 10 segments were 78 nt in length, and the remaining 6 segments were 75 nt in length. Four such designs were made, and the segments were chemically synthesized. The growth phenotypes of the viruses derived from these four constructs were determined as described in Materials and Methods.

The four designs (Fig. 2) were synthesized and used to replace the WT copy of the segment of PV cDNA. Unexpectedly, all four designs yielded virus with WT growth properties. At first glance, the results suggested that the functional element may reside in segment 6 from the right, which is W in all four designs. However, when this W segment alone was changed to S in the background of W, the virus again grew just like WT (Fig. 2). These results suggested that perhaps the destruction of more than one RNA element was required to cause the defects in replication described above.

Search for Functional RNA Elements in the 3′-Terminal 450 nt of the 3Dpol Coding Domain.

We returned to genetic engineering techniques and subdivided the 3′-terminal 450 nt of the 3Dpol domain into two equal segments (Fig. 3, rows 2 and 3). Whereas the 450-nt SD fragment resulted in a highly debilitated variant, either segment alone yielded virus with WT-like growth properties (Fig. 3, rows 1–3), an observation indicating the presence of two separate functionally redundant RNA elements. Further reduction in the size of the SD fragments resulted in the identification of two 75-nt-long segments, which we called α and β, with a 150-nt-long spacer between them (Fig. 3, row 5). To examine the possible emergence of variants, the virus derived from Δ3DhSD(α+β) (Fig. 3, row 5) was passaged five times on HeLa cells. There were no changes in nucleotide sequence over the entire genome.

Identification of two functionally redundant elements, α and β, in the 3′-terminal 450-nt-long segment of the 3Dpol coding sequence. The 3′-terminal 450 nt of the 3Dpol coding sequence (6,920–7,369 nt) was subjected to SD changes as shown in rows 1–6. The growth phenotypes of the viruses derived from these constructs were characterized as described in Materials and Methods. Above the figure, the positions of α and β are shown within the 450-nt fragment.

To determine at which stage of the PV life cycle α and β function, we first translated in HeLa cell-free extracts (24) full-length RNA transcripts containing the corresponding scrambled sequences. The results indicated normal translation and processing of the polyprotein synthesized from SD-containing transcripts (Fig. S2). The effect of scrambling α and β, or ΔP3, on RNA replication was determined with luciferase reporter constructs in which the firefly luciferase gene replaced the P1 domain of the polyprotein (25) (Fig. 4A). Transfections with WT and SD reporter transcript RNAs were carried out both in the absence and presence of guanidine hydrochloride (GnHCl), a potent inhibitor of PV RNA replication (26). Luciferase activity in the presence of GnHCl represents the level of protein translation after transfection, which was the same with both WT and SD RNAs. Luciferase activity with the SD constructs in the absence of the drug, however, was barely detectable (Fig. 4B). We conclude that scrambling of ΔP3 or α + β leads to a defect in RNA replication.

Comparison of RNA replication of Fluc reporter replicons containing WT or SD sequences. (A) Structures of three reporter replicons containing WT or SD sequences. The linker contains an N-terminal 3CDpro protease cleavage site and a C-terminal 2Apro protease cleavage site. (B) To determine the level of RNA replication, RNA transcripts were transfected into HeLa R19 cells both in the absence and presence of 2 mM GnHCl. The luciferase activity in the absence and presence of GnHCl was measured. The luciferase data are the average of two independent experiments.

α and β Lack Nucleotide Sequence and Structural Similarities.

The functional redundancy of α and β suggests that these domains might possess some sequence or structural similarities. This is not the case (Figs. 5 A and B and 6A and Fig. S3). Using the RNA MFold program (27), we predicted that the entire WT α-segment forms an unstable structure (Fig. 5B), whereas element β contains a distinct and stable hairpin surrounded by unstructured sequences (Fig. 6A). Scrambling of either RNA sequence results in altered structures, as expected (Figs. 5C and 6B).

Mutational analysis of the conserved hairpin in β-segment. MFold was used for the determination of predicted RNA structures. (A) Structure of the WT β-segment. (B) Structure of the SD β-segment. (C) Structure of mut1-β, in which the nucleotide sequence of the β-hairpin was altered. (D) Structure of SD-β37 in which both the sequence and structure of the β-hairpin were changed. (E) Growth phenotypes of viruses derived from the different constructs shown in A to D. (F) Growth phenotypes of viruses in which only the 48-nt conserved sequence of α is scrambled along with fully scrambled β (SD[α48 + β75]) or with an SD β-hairpin (SD[α48 + β37]).

Search for Active Domains Within α and β.

Element α contains a 48-nt-long sequence that is highly conserved in C-cluster enteroviruses (Fig. 5A) but not in genomes of B-cluster enteroviruses (Fig. S3). To test the possibility that this 48-nt-long sequence represents the primary functional domain within α, we tested a construct in which β was fully scrambled and only the conserved 48 nt of α were scrambled (construct SD[α48 + β75]; Fig. 6F). The growth phenotype of this virus was essentially the same as that of Δ3DhSD, in which the entire α and β segments are scrambled (compare Fig. 3, row 5, and Fig. 6F and Fig. S1); that is, the replication of this construct was severely suppressed. This result supports our hypothesis that the functional domain of α is primarily contained within the conserved 48-nt segment. On the other hand, our data explain why Witwer et al. (14) failed to predict the presence of α in enterovirus genomes, because its core structure of 48 nt is not generally conserved.

It was previously shown by bioinformatic methods that the stable hairpin within the β element is fully conserved among virus genomes of all clusters of the genus Enterovirus (14). Therefore, we selected this hairpin as the likely functional domain within the 75-nt-long sequence of β for mutational analyses while keeping α fully scrambled. Mutating 10 different nucleotides, a change that altered only the nucleotide sequence but not the structure of the hairpin resulted in an intermediate growth phenotype (Fig. 6 C and E). However, when both sequence and structure were destroyed by SD (SD-β37; Fig. 6D), the growth phenotype of the construct was essentially the same as when α and β were fully scrambled (Δ3DhSD) (compare Fig. 3, row 5, and Fig. 6E).

Finally, we combined the scrambled versions of the minimal functional segments of α (48 nt) and β (hairpin) (construct SD[α48 + β37]; Fig. 6F). The poor growth phenotype of this variant (Fig. S1) clearly demonstrated the important function of the specific RNA sequences/structures in WT α and WT β for the growth of PV. Whether WT α or WT β can be transplanted to the 5′-NTR, thereby rescuing the replication phenotype of ΔP3SD, is currently under investigation.

Concluding Remarks

In the present study, we have demonstrated the usefulness of SD, a unique strategy to recode the ORF of the PV polyprotein radically combined with de novo gene synthesis in searches for functional RNA elements, regardless of whether these elements form stable higher order structures or are “linear” nucleotide sequences. Our method can be used for similar analyses of functional RNA sequences/structures in the genomes of other RNA viruses. Importantly, this method also offers an easy and effective way of identifying functionally redundant RNA elements that cannot be located by conventional mutational analyses.

Using the SD strategy combined with a computer-aided search method, we have discovered two separate and functionally redundant domains (α and β), each about 75 nt long, that are separated by 150 nt. These functional elements are located in the C-terminal coding sequence of the PV RNA polymerase 3Dpol, the key enzyme in RNA replication. The signal in the α-segment was narrowed down to a 48-nt-long sequence that is highly conserved in genomes of the C-cluster enteroviruses. In β, a hairpin formed by 37 nt proved to be essential for the function of this element. Witwer et al. (14) have previously predicted by bioinformatic methods the presence of a conserved hairpin in β of enteroviruses, but they failed to predict the existence of α, probably because it does not form a stable and conserved structure in genomes of the majority of enteroviruses. We suspect that α may be important in all enteroviruses; perhaps its location is variable just like that of the cre(2C) element in different picornavirus genomes (1). Whether the β segment itself can exert an essential function to replication in enteroviruses from clusters other than the C-cluster is currently under investigation. It should be noted again that in 2002, we extensively analyzed the hairpin in β by mutational analyses and, in the context of a WT α, failed to find any replication phenotype of PV genome correlating with the changes. Considering the high degree of conservation (14), these early results were very surprising but they can now be explained by the redundant function of α and β.

Functional RNA elements in the RNA polymerase coding sequence of plus strand RNA virus genomes have been previously demonstrated (28, 29). For example, two important RNA structures were discovered in the 3′-terminal coding sequence of hepatitis C virus RNA polymerase NS5B, one of which is involved in two different RNA/RNA interactions (30, 31). These structures, however, possess independent functions. To our knowledge, there are no previous reports of such functionally redundant RNA elements in the genomes of plus strand RNA viruses as α and β. It is not clear whether this is because redundant elements are rare or because they are difficult to discover by conventional mutational analyses. What is particularly unusual about α and β is that although they are functionally redundant, they are completely different in both nucleotide sequence and RNA structure. It could be argued that SD generates structures different from those uncovered here but at the same loci, which inhibit PV replication. This is highly unlikely because multinucleotide mutations in a “codon pair-maximized” sequence (18) of the same 3D coding sequence that are radically different from the SD sequence produced the same phenotypes. To explain the redundant function of α and β, we speculate that a viral or cellular protein binds to both but that the binding of either one alone is sufficient for function. This possibility is currently under investigation. Although the exact function of α and β remains to be determined, the identification of these RNA elements provides unique insight into the complex interplay between genome sequence and PV RNA replication.

Cell Culture.

Plasmids.

i) pT7PVM: The pT7PVM contains the cDNA of type 1 PV [PV1(M)].

ii) pPV-[P2/5′-P3]SD and pPV-ΔP3SD: The pPV-[P2/5′-P3]SD (3,386–5,600 nt) and pPV-ΔP3SD (5,605–7,369 nt) fragments, containing SnaB I and Bgl II or Bgl II and EcoR I restriction sites at the 5′ and 3′ ends, respectively, were synthesized by Mr. Gene GmbH or GenScript USA, Inc. The synthetic fragments were inserted into similarly restricted pT7PVM. All subclones containing SD sequences were made by standard genetic engineering techniques from synthetic pPV-[P2/5′-P3]SD and from pPV-ΔP3SD clones.

iii) Plasmids used for signal location search: Four fragments (6,140–7,369 nt) with alternating W and S segments were synthesized by GenScript USA, Inc. and were cloned back into pT7PVM.

iv) Fluc replicons of WT, ΔP3SD, and SD(α + β): The WT replicon plasmid was previously constructed in our laboratory (32). The ΔP3SD and SD(α + β) replicons were the same as the WT replicon except that they contained the partial P3SD region (5,606–7,369 nt), or the scrambled α + β design, respectively. Details of luciferase activity assay are provided as before (33).

(2007) Replication of poliovirus requires binding of the poly(rC) binding protein to the cloverleaf as well as to the adjacent C-rich spacer sequence between the cloverleaf and the internal ribosomal entry site. J Virol81:10017–10028.