ABSTRACT

Despite extensive laboratory investigations in patients with respiratory tract infections, no microbiological cause can be identified in a significant proportion of patients. In the past 3 years, several novel respiratory viruses, including human metapneumovirus, severe acute respiratory syndrome (SARS) coronavirus (SARS-CoV), and human coronavirus NL63, were discovered. Here we report the discovery of another novel coronavirus, coronavirus HKU1 (CoV-HKU1), from a 71-year-old man with pneumonia who had just returned from Shenzhen, China. Quantitative reverse transcription-PCR showed that the amount of CoV-HKU1 RNA was 8.5 to 9.6 × 106 copies per ml in his nasopharyngeal aspirates (NPAs) during the first week of the illness and dropped progressively to undetectable levels in subsequent weeks. He developed increasing serum levels of specific antibodies against the recombinant nucleocapsid protein of CoV-HKU1, with immunoglobulin M (IgM) titers of 1:20, 1:40, and 1:80 and IgG titers of <1:1,000, 1:2,000, and 1:8,000 in the first, second and fourth weeks of the illness, respectively. Isolation of the virus by using various cell lines, mixed neuron-glia culture, and intracerebral inoculation of suckling mice was unsuccessful. The complete genome sequence of CoV-HKU1 is a 29,926-nucleotide, polyadenylated RNA, with G+C content of 32%, the lowest among all known coronaviruses with available genome sequence. Phylogenetic analysis reveals that CoV-HKU1 is a new group 2 coronavirus. Screening of 400 NPAs, negative for SARS-CoV, from patients with respiratory illness during the SARS period identified the presence of CoV-HKU1 RNA in an additional specimen, with a viral load of 1.13 × 106 copies per ml, from a 35-year-old woman with pneumonia. Our data support the existence of a novel group 2 coronavirus associated with pneumonia in humans.

Since no microbiological cause can be identified for a significant proportion of patients with respiratory tract infections (18, 29), research has been conducted to identify novel agents. Of the three novel agents identified in recent 3 years, including human metapneumovirus (36), severe acute respiratory syndrome (SARS) coronavirus (SARS-CoV) (25), and human coronavirus NL63 (HCoV-NL63) (6, 37), two were coronaviruses. Coronaviruses possess the largest genomes of all RNA viruses, consisting of about 30 kb. As a result of their unique mechanism of viral replication, coronaviruses have a high frequency of recombination.

Based on genotypic and serological characterization, coronaviruses were divided into three distinct groups, with human coronavirus 229E (HCoV-229E) being a group 1 coronavirus and human coronavirus OC43 (HCoV-OC43) being a group 2 coronavirus (16). They account for 5 to 30% of human respiratory tract infections. In late 2002 and 2003, the epidemic caused by SARS-CoV affected more than 8,000 people with 750 deaths (23-25, 44, 45, 51). We have also reported the isolation of SARS-CoV-like viruses from Himalayan palm civets, which suggested that animals could be the reservoir for the ancestor of SARS-CoV (9). On the basis of genome analysis, SARS-CoV belonged to a fourth coronavirus group or alternatively was a distant relative of group 2 coronaviruses (4, 20, 28, 31, 48). Recently, a novel group 1 human coronavirus associated with respiratory tract infections, HCoV-NL63, was discovered, and its genome was sequenced (37).

In this study, we report the discovery of a novel group 2 coronavirus in the nasopharyngeal aspirates (NPAs) of patients with pneumonia. The complete genome of the coronavirus was sequenced and analyzed. Based on the findings of this study, we propose that this new virus be designated coronavirus HKU1 (CoV-HKU1).

MATERIALS AND METHODS

Index patient, clinical specimens, and microbiological tests.NPAs were collected from the index patient weekly from the first till the fifth week of illness, stool and urine were collected in the first and second weeks, and sera were collected in the first, second, and fourth weeks.

RNA extraction.Viral RNA was extracted from the NPA, urine, and fecal specimens by using the QIAamp Viral RNA Mini kit (QIAgen, Hilden, Germany). The RNA pellet was resuspended in 10 μl of DNase-free, RNase-free double-distilled water and was used as the template for RT-PCR.

RT-PCR of the pol gene of coronaviruses, using conserved primers and DNA sequencing.A 440-bp fragment of the RNA-dependent RNA polymerase (pol) gene of coronaviruses was amplified by RT-PCR with conserved primers (5′-GGTTGGGACTATCCTAAGTGTGA-3′ and 5′-CCATCATCAGATAGAATCATCATA-3′) designed by multiple alignment of the nucleotide sequences of available pol genes of known coronaviruses. RT was performed by using the SuperScript II kit (Invitrogen, San Diego, Calif.). The PCR mixture (50 μl) contained cDNA, PCR buffer (10 mM Tris-HCl [pH 8.3], 50 mM KCl, 3 mM MgCl2, 0.01% gelatin), 200 μM (each) deoxynucleoside triphosphates, and 1.0 U of Taq polymerase (Boehringer, Mannheim, Germany). The mixtures were amplified in 40 cycles of 94°C for 1 min, 48°C for 1 min, and 72°C for 1 min and a final extension at 72°C for 10 min in an automated thermal cycler (Perkin-Elmer Cetus, Gouda, The Netherlands).

The PCR products were gel purified using the QIAquick gel extraction kit (QIAgen, Hilden, Germany). Both strands of the PCR products were sequenced twice with an ABI Prism 3700 DNA analyzer (Applied Biosystems, Foster City, Calif.), using the two PCR primers. The sequences of the PCR products were compared with known sequences of the pol genes of coronaviruses in the GenBank database.

Complete genome sequencing and genome analysis.The complete genome of CoV-HKU1 was amplified and sequenced by using the RNA extracted from the NPAs as a template. The RNA was converted to cDNA by a combined random-priming and oligo(dT) priming strategy. As the initial results obtained from sequencing the 440-bp fragment revealed that the polymerase (Pol) of CoV-HKU1 is homologous to those of other group 2 coronaviruses, the cDNA was amplified by degenerate primers designed by multiple alignment of the genomes of murine hepatitis virus (MHV) (GenBank accession no. AF201929
), HCoV-OC43 (GenBank accession no. NC_005147
), bovine coronavirus (BCoV) (GenBank accession no. NC_003045
), rat sialodacryoadenitis coronavirus (SDAV) (GenBank accession no. AF207551
), equine coronavirus NC99 (ECoV) (GenBank accession no. AY316300
), and porcine hemagglutinating encephalomyelitis virus (PHEV) (GenBank accession no. AY078417
) and additional primers designed from the results of the first and subsequent rounds of sequencing. These primer sequences are available on request. The 5′ end of the viral genome was confirmed by rapid amplification of cDNA ends using the 5′/3′ rapid amplification of cDNA ends kit (Roche, Mannheim, Germany). Sequences were assembled and manually edited to produce a final sequence of the viral genome. The nucleotide sequence of the genome and the deduced amino acid sequences of the open reading frames (ORFs) were compared to those of other coronaviruses. Phylogenetic tree construction was performed by using the PileUp method with GrowTree (Genetics Computer Group, Inc.). Prediction of signal peptides and their cleavage sites was performed by using SignalP (21). Protein family analysis was performed by using PFAM and InterProScan (1, 2). Prediction of transmembrane domains was performed by using TMpred and TMHMM (11, 32). PHDhtm was also used when there was disagreement between the results obtained by using TMpred and TMHMM (3). Potential N-glycosylation sites were predicted by using ScanProsite (7).

Quantitative RT-PCR.For real-time quantitative PCR assays, cDNA was amplified in SYBR Green I fluorescence reactions (Roche) (23). Briefly, 20 μl of reaction mixtures containing 2 μl of cDNA, 3.5 mM MgCl2, and 0.25 M (each) forward and reverse specific primers (5′-GGTTGGGATTATCCTAAATGTGA-3′ and 5′-CCATCATCACTCAAAATCATCATA-3′) were subjected to thermal cycling at 95°C for 10 min followed by 50 cycles of 95°C for 10 s, 55°C for 4 s, and 72°C for 18 s, using a Light cycler (Roche). A plasmid with the target sequence was used to generate the standard curve. At the end of the assay, PCR products (440-bp fragment of pol) were subjected to a melting curve analysis (65 to 95°C, 0.1°C/s) to confirm the specificity of the assay.

Cloning and purification of His6-tagged recombinant N protein of CoV-HKU1.To produce a plasmid for protein purification, primers (5′-TTTTCCTTTTGCGGCCGCTTAAGCAACAGAGTCTTCTA-3′ and 5′-CGGAATTCGATGTCTTATACTCCCGGT-3′) were used to amplify the gene encoding the N protein of CoV-HKU1 by RT-PCR. The sequence coding for amino acid residues 1 to 441 of the N protein was amplified and cloned into the EcoRI and NotI sites of expression vector pET-28b(+) (Novagen, Madison, Wis.) in frame and downstream of the series of six histidine residues. The recombinant N protein was expressed and purified by using the Ni2+-loaded HiTrap chelating system (Amersham Pharmacia) according to the manufacturer's instructions.

Western blot analysis.Western blot analysis was performed according to our published protocol (45). Briefly, 600 ng of purified His6-tagged recombinant N protein of CoV-HKU1 was loaded into each well of a sodium dodecyl sulfate-10% polyacrylamide gel and subsequently electroblotted onto a nitrocellulose membrane (Bio-Rad, Hercules, Calif.). The blot was cut into strips, and the strips were incubated separately with a 1:2,000 dilution of serum samples obtained during the first, second, and fourth weeks of the patient's illness. Serum samples from two healthy blood donors were used as controls. Antigen-antibody interaction was detected with an ECL fluorescence system (Amersham Life Science, Buckinghamshire, United Kingdom).

ELISA with recombinant N protein of CoV-HKU1.Sera from 100 healthy blood donors were used to set up a baseline for the N protein ELISA-based IgG and IgM antibody tests. The ELISA-based IgG and IgM antibody tests were modified from our previous publication (45). Briefly, each well of a Nunc (Roskilde, Denmark) immunoplate was coated with purified His6-tagged recombinant N protein (20 ng for IgG and 80 ng for IgM) for 1 h and then blocked in phosphate-buffered saline with 5% skim milk. The serum samples obtained from the patient during the first, second, and fourth weeks of the illness were serially diluted and were added to the wells of the His6-tagged recombinant N protein-coated plates in a total volume of 100 μl and incubated at 37°C for 2 h. After five washes with washing buffer, 100 μl of diluted horseradish peroxidase-conjugated goat antihuman IgG (1:4,000) and mouse antihuman IgM (1:1,000) antibodies (Zymed Laboratories Inc., South San Francisco, Calif.) was added to the wells and incubated at 37°C for 1 h. After washing with washing buffer five times, 100 μl of diluted 3,3′,5,5′-tetramethylbenzidine (Zymed Laboratories, Inc.) was added to each well and incubated at room temperature for 15 min. One hundred microliters of 0.3 M H2SO4 was added, and the absorbance at 450 nm of each well was measured. Each sample was tested in duplicate, and the mean absorbance for each serum was calculated.

Screening of NPAs collected during the SARS period.Four hundred NPAs negative for SARS-CoV by RT-PCR, obtained from patients with respiratory tract infections during the SARS period in 2003 (median age 35, range 2 to 87), were screened for the presence of CoV-HKU1 RNA using the protocol described above.

RESULTS

Index patient and microbiological tests.A 71-year-old Chinese man was admitted to hospital in January 2004 because of fever and productive cough with purulent sputum for 2 days. He had a history of pulmonary tuberculosis more than 40 years ago complicated by cicatrization of the right upper lobe and bronchiectasis with chronic Pseudomonas aeruginosa colonization of airways. He was a chronic smoker and also had chronic obstructive airway disease, hyperlipidemia, and asymptomatic abdominal aortic aneurysm. He had just returned from Shenzhen, China, 3 days before admission. A chest radiograph showed patchy infiltrates over the left lower zone. NPA for direct antigen detection of respiratory viruses, RT-PCR of influenza A virus, human metapneumovirus, and SARS-CoV, and viral cultures were negative. After the virus was determined to be a coronavirus, the NPAs were inoculated into RD (human rhabdomyosarcoma), I13.35 (murine macrophage), L929 (murine fibroblast), HRT-18 (colorectal adenocarcinoma), and B95a (marmoset B-lymblastoid) cell lines and mixed neuron-glia culture. No cytopathic effect was observed. Quantitative RT-PCR, using the culture supernatants and cell lysates to monitor the presence of viral replication, also showed negative results. Moreover, intracerebrally inoculated suckling mice remained healthy after 14 days. Sputum was negative for bacterial and mycobacterial pathogens. Paired sera for antibodies against Mycoplasma, Chlamydia, Legionella, and SARS-CoV were negative. His symptoms improved, and he was discharged after 5 days of hospitalization.

RT-PCR of the pol gene of coronaviruses by using conserved primers and DNA sequencing.RT-PCR of the pol gene from the patient's NPA showed a band of about 440 bp. Sequencing of the band showed 91% amino acid and 84% nucleotide identity to the corresponding sequence in MHV (GenBank accession no. AF201929
), 89% amino acid and 82% nucleotide identity to HCoV-OC43 (GenBank accession no. NC_005147
), and 89% amino acid and 82% nucleotide identity to BCoV (GenBank accession no. NC_003045
).

Genome analysis.The genome of CoV-HKU1 is a 29,926-nucleotide, polyadenylated RNA. The G+C content is 32%, the lowest among all known coronaviruses with genome sequence available (Table 1). The genome organization is the same as that of other coronaviruses, with the characteristic gene order 5′-replicase, spike (S), envelope (E), membrane (M), nucleocapsid (N)-3′. Both 5′ and 3′ ends contain short untranslated regions. The 5′ end of the genome consists of a putative 5′ leader sequence (17, 19). A putative transcription regulatory sequence (TRS) motif, 5′-AAUCUAAAC-3′ (as in MHV and BCoV), or alternatively, 5′-UAAAUCUAAAC-3′, was found at the 3′ end of the leader sequence and precedes each translated ORF except ORF5 (Table 2). As in SDAV and MHV, ORF5, which encodes the putative E protein, may share the same TRS with ORF4, suggesting that the translation of the E protein is cap independent, possibly via an internal ribosomal entry site (IRES) (34). A stretch of 13 nucleotides, AUUUAUUGUUUGG (similar to the IRES element, UUUUAUUCUUUUU, in MHV), upstream of the initiation codon of the E protein is present in CoV-HKU1 (12). Further experiments would determine if this sequence acts as an IRES for this ORF and whether 5′-UAAAUCUAAAC-3′ or 5′-AAUCUAAAC-3′ is the real TRS for CoV-HKU1. Of note is that 5′-AAUCUAAAC-3′ and 5′-UAAAUCUAAAC-3′ are also observed at nucleotide positions 19528 and 22518 of the genome, respectively, neither of which precedes an ORF of obvious significance. Analysis of more genomes of CoV-HKU1 would reveal whether this is a consistent feature and its possible role in recombination of the CoV-HKU1 genome. The 3′ untranslated region contains a predicted bulged stem-loop structure 2 to 66 nucleotides downstream of N gene (nucleotide position 29647 to 29711). This bulged stem-loop structure is conserved in group 2 coronaviruses (8). Downstream to the bulged stem-loop structure, 63 to 115 nucleotides downstream of the N gene (nucleotide position 29708 to 29760), a pseudoknot structure is present. This pseudoknot structure is conserved among coronaviruses and plays a role in coronavirus RNA replication (42).

The coding potential of the CoV-HKU1 genome is shown in Fig. 1 and Table 2, and the phylogenetic analysis of the chymotrypsin-like protease (3CLpro), Pol, helicase, hemagglutinin-esterase (HE), S, E, M, and N is shown in Fig. 2.

The replicase 1a ORF (nucleotide position 206 to 13600) and replicase 1b ORF (nucleotide position 13600 to 21753) occupy 21.5 kb of the CoV-HKU1 genome. Similar to the case with other coronaviruses, a frame shift interrupts the protein-coding regions and separates ORFs 1a and 1b. This ORF encodes a number of putative proteins, including nsp1 (which contains the putative papain-like proteases), nsp2 (the putative 3CLpro), nsp9 (the putative Pol), nsp10 (the putative helicase), and other proteins with unknown functions. These proteins are produced by proteolytic cleavage of the large replicase polyprotein. The arrangement of the resulting putative proteins is the same as that in the MHV genome (Fig. 3). This polyprotein is synthesized by a −1 ribosomal frameshift at a conserved site (UUUAAAC) upstream of a pseudoknot structure at the junction of ORF 1a and ORF 1b. This ribosomal frameshift would result in a polyprotein of 7,182 amino acids, which has 75 to 77% amino acid identities with the polyproteins of other group 2 coronaviruses and 43 to 47% amino acid identities with the polyproteins of non-group 2 coronaviruses. The Pol of CoV-HKU1, with 928 amino acids, has 87 to 90% amino acid identities with the Pol of other group 2 coronaviruses and 54 to 65% amino acid identities with the Pol of non-group 2 coronaviruses (Table 1 and Fig. 2). The catalytic histidine and cysteine amino acid residues, conserved among the 3CLpro in all coronaviruses, are present in the predicted 3CLpro of CoV-HKU1 (amino acids His3375 and Cys3479 of ORF 1a). nsp1, which corresponds to p210 in MHV, contains two papain-like proteases (PLpro), PL1pro and PL2pro. In the N terminus of nsp1 (amino acid residues 945 to 1104 of ORF 1a), there are 14 tandem copies of a 30-base repeat which encodes NDDEDVVTGD, followed by two 30-base regions that encode NNDEEIVTGD and NDDQIVVTGD, located inside the acidic domain upstream of PL1pro (Fig. 3). This acidic tandem repeat (ATR) is not observed in other coronaviruses. The presence of this ATR is confirmed by sequencing the corresponding part of the genome from two NPAs collected 1 week apart. The presence of the repeat does not result in a marked change in the isoelectric point of the acidic domain (3.31 in CoV-HKU1 versus 3.92 in MHV) or the predicted secondary structure (random coil in both CoV-HKU1 and MHV). Moreover, the characteristic amino acid residues for proteolytic cleavage by the two PLpro, determined by mutagenesis studies, located at the junctions of p28/p65, p65/nsp1, and nsp1/nsp2 in MHV, are all present in the corresponding positions in CoV-HKU1 (13). Furthermore, the zinc finger domain proposed to possess nonproteolytic activity in other coronaviruses is also present in PL1pro of CoV-HKU1 (10).

Arrangements of proteins in replicase polyprotein in HKU1 compared with those in HCoV-OC43, BCoV, and MHV. Alignment of the AC domains of HCoV-OC43, BCoV, and MHV and the AC domains and ATR (underlined) of CoV-HKU1 in the two patients was generated with ClustalX 1.83. AC domain, acidic domain. GenBank accession numbers are as follows: MHV, NC_001846
; BCoV, NC_003045
; HCoV-OC43, AY585229
.

ORF 2 (nucleotide position 21773 to 22933) encodes the predicted HE glycoprotein with 386 amino acids. HE is present in group 2 coronaviruses and influenza C virus. The HE of CoV-HKU1 has 50 to 57% amino acid identities with the HE of other group 2 coronaviruses (Table 1 and Fig. 2). PFAM and InterProScan analysis of the ORF shows that amino acid residues 1 to 349 of the predicted protein constitute a member of the hemagglutinin esterase family (PFAM accession no. PF03996
and INTERPRO accession no. IPR007142
). Furthermore, PFAM and InterProScan analysis shows that amino acid residues 122 to 236 of the predicted protein constitute the hemagglutinin domain of the HE fusion glycoprotein family (PFAM accession no. PF02710
and INTERPRO accession no. IPR003860
). SignalP analysis reveals a signal peptide probability of 0.738, with a cleavage site between residues 13 and 14. Although TMpred and TMHMM analysis of the ORF shows four and three transmembrane domains, respectively, PHDhtm analysis shows only one transmembrane domain, at positions 354 to 376. This concurs with only one transmembrane region reported in the C terminus of the HE of BCoV and puffinosis virus (14). PrositeScan analysis of the HE protein of CoV-HKU1 reveals eight potential N-linked glycosylation (six NXS and two NXT) sites. These are located at positions 83 (NYT), 110 (NGS), 145 (NVS), 168 (NYS), 193 (NFS), 286 (NSS), 314 (NVS), and 328 (NFT). The putative active site for neuraminate O-acetyl-esterase activity, FGDS, is located at positions 31 to 34 (39). In BCoV, it has been shown that HE is required for viral replication in one study (38) but is not essential for viral infection under some specific experimental conditions (26). In MHV, the expression of HE is heterogeneous, depending on the number of copies of UCUAA in the leader sequence, the presence of initiation codon, upstream promoter, and a complete ORF with C-terminal transmembrane anchor (49), and appears to be related to central nervous system tropism (50). In CoV-HKU1, the initiation codon and a complete ORF are present. Since the HE of CoV-HKU1 is quite distantly related to the HE of MHV and BCoV/HCoV-OC43 (Fig. 2), further experiments have to be performed to determine the essentiality and function of HE in CoV-HKU1.

ORF 3 (nucleotide position 22942 to 27012) encodes the predicted S glycoprotein (PFAM accession no. PF01601
) with 1,356 amino acids. The S protein of CoV-HKU1 has 60 to 61% amino acid identities with the S proteins of other group 2 coronaviruses but less than 35% amino acid identities with the S proteins of non-group 2 coronaviruses (Table 1 and Fig. 2). InterProScan analysis predicts it as a type I membrane glycoprotein. Important features of the S protein of CoV-HKU1 are depicted in Fig. 4. PrositeScan of the S protein of CoV-HKU1 revealed 28 potential N-linked glycosylation (12 NXS and 16 NXT) sites. SignalP analysis revealed a signal peptide probability of 0.909, with a cleavage site between residues 13 and 14. By multiple alignments with the S proteins of other group 2 coronaviruses, a potential cleavage site located after RRKRR, between residues 760 and 761, where S will be cleaved into S1 and S2, was identified. Immediately upstream to RRKRR, there is a series of five serine residues that are not present in any other known coronaviruses (Fig. 4). Most of the S protein (residues 15 to 1300) is exposed on the outside of the virus, with a transmembrane domain at the C terminus (TMHMM analysis of the ORF shows one transmembrane domain at positions 1301 to 1356), followed by a cytoplasmic tail rich in cysteine residues. Two heptad repeats, located at residues 982 to 1083 (HR1) and 1250 to 1297 (HR2), identified by multiple alignments with other coronaviruses, are present. The receptor for S protein binding in MHV and HCoV-OC43 are CEACAM1 and sialic acid, respectively (15, 41, 43). While the three conserved regions (sites I, II, and III) and amino acid residues (Thr62, Thr212, Tyr214, and Tyr216) in the N-terminal of the MHV S protein important for receptor-binding activity (33) are present in CoV-HKU1 (Fig. 4), the amino acid residues on the S protein of HCoV-OC43 that are important for receptor binding are not well defined. Further experiments should be performed to delineate the receptor for CoV-HKU1.

ORF 4 (nucleotide position 27051 to 27380) encodes a predicted protein with 109 amino acids. This ORF overlaps with the ORF that encodes the E protein. PFAM analysis of the ORF shows that the predicted protein is a member of the coronavirus nonstructural protein NS2 family (PFAM accession no. PF04753
). TMpred and TMHMM analysis does not reveal any transmembrane helix. This predicted protein of CoV-HKU1 has 44 to 51% amino acid identities with the corresponding proteins of other group 2 coronaviruses.

ORF 5 (nucleotide position 27373 to 27621) encodes the predicted E protein with 82 amino acids. The E protein of CoV-HKU1 has 54 to 60% amino acid identities with the E proteins of other group 2 coronaviruses but less than 35% amino acid identities with the E proteins of non-group 2 coronaviruses (Table 1 and Fig. 2). PFAM and InterProScan analysis of the ORF shows that the predicted E protein is a member of the nonstructural protein NS3/small envelope protein E family (PFAM accession no. PF02723
). SignalP analysis predicts the presence of a transmembrane anchor (probability 0.995). TMpred analysis of the ORF shows two transmembrane domains at positions 16 to 34 and 39 to 59, and TMHMM analysis of the ORF shows two transmembrane domains at positions 10 to 32 and 39 to 58, consistent with the anticipated association of the E protein with the viral envelope.

ORF 6 (nucleotide position 27633 to 28304) encodes the predicted M protein with 223 amino acids. The M protein of CoV-HKU1 has 76 to 84% amino acid identities with the M proteins of other group 2 coronaviruses but less than 40% amino acid identities with the M proteins of non-group 2 coronaviruses (Table 1 and Fig. 2). PFAM analysis of the ORF shows that the predicted M protein is a member of the coronavirus matrix glycoprotein family (PFAM accession no. PF01635
). SignalP analysis predicts the presence of a transmembrane anchor (probability, 0.926). TMpred analysis of the ORF shows three transmembrane domains at positions 21 to 42, 53 to 74, and 77 to 98. TMHMM analysis of the ORF shows three transmembrane domains at positions 20 to 39, 46 to 68, and 78 to 100. The N-terminal 19 to 20 amino acids are located on the outside, and the C-terminal 123- to 125-amino-acid hydrophilic domain is located on the inside of the virus.

ORF 7 (nucleotide position 28320 to 29645) encodes the predicted N protein (PFAM accession no. PF00937
) with 441 amino acids. The N protein of CoV-HKU1 has 57 to 68% amino acid identities with the N proteins of other group 2 coronaviruses but less than 40% amino acid identities with the N proteins of non-group 2 coronaviruses (Table 1 and Fig. 2).

ORF 8 (nucleotide position 28342 to 28959) encodes a hypothetical protein (N2) of 205 amino acids within the ORF that encodes the predicted N protein. PFAM analysis of the ORF shows that the predicted protein is a member of the coronavirus nucleocapsid I protein family (PFAM accession no. PF03187
). This hypothetical N2 protein of CoV-HKU1 has 32 to 39% amino acid identities with the N2 proteins of other group 2 coronaviruses. This protein has been shown to be nonessential for viral replication in MHV (5).

Quantitative RT-PCR.Quantitative RT-PCR showed that the amounts of CoV-HKU1 RNA were 8.5 × 105 and 9.6 × 106 copies per ml in two NPAs collected in the first week of the illness and 1.5 × 105 copies per ml in the NPA collected in the second week of the illness, but CoV-HKU1 RNA was undetectable in the NPAs collected in the third, fourth, and fifth weeks of the illness (Fig. 5). CoV-HKU1 RNA was undetectable in all urine and stool specimens.

Sequential quantitative RT-PCR for CoV-HKU1 in NPAs and serum IgG titers against N protein of CoV-HKU1.

Purification of His6-tagged recombinant N protein and Western blot analysis.To produce recombinant N protein of CoV-HKU1, the recombinant N protein was expressed in Escherichia coli and subsequently purified. The purified recombinant N protein was separated on sodium dodecyl sulfate-polyacrylamide gels followed by Western blot analysis with serum samples. Several prominent immunoreactive bands were visible for serum samples collected during the second and fourth weeks of the patient's illness (Fig. 6, lanes 2 and 3). The sizes of the largest bands were about 53 kDa, consistent with the expected size of 52.8 kDa for the full-length His6-tagged recombinant N protein, whereas the other bands were probably its degradation products. Only very faint bands were observed for serum samples obtained from the patient during the first week of the illness (Fig. 6, lane 1) and two healthy blood donors (Fig. 6, lanes 4 and 5).

Western blot analysis of purified recombinant CoV-HKU1 N protein antigen. Prominent immunoreactive protein bands of about 53 kDa were visible on the Western blot that used recombinant N protein as the antigen during the second and fourth weeks of the patient's illness (lanes 2 and 3). Only very faint bands were observed for serum samples obtained from the patient during the first week of the illness (lane 1) and two healthy blood donors (lanes 4 and 5).

ELISA using recombinant N protein of CoV-HKU1.An ELISA-based antibody test was developed with this recombinant N protein for the detection of specific antibodies against this protein. Box titration was carried out with serial dilutions of recombinant N protein coating antigen (in one axis) and serum (in the other axis) obtained from the fourth week of the patient's illness. The results identified 20 and 80 ng of purified recombinant N protein per well as the ideal amounts for plate coating and 1:1,000 and 1:20 as the most optimal serum dilutions for IgG and IgM detection, respectively.

To establish the baseline for the ELISA tests, serum samples (diluted at 1:1,000 and 1:20 for IgG and IgM, respectively) from 100 healthy blood donors were tested. The mean ELISA optical densities at 450 nm for IgG and IgM detection were 0.178 and 0.224, with standard deviations of 0.070 and 0.117, respectively. Absorbance values of 0.387 and 0.576 were selected as the cutoff values (means plus three standard deviations) for IgG and IgM, respectively. Using these cutoffs, the titers for IgG of the patient's sera obtained during the first, second, and fourth weeks of the illness were <1:1,000, 1:2,000, and 1:8,000, respectively, and those for IgM were 1:20, 1:40, and 1:80, respectively (Fig. 5).

Screening of NPAs during the SARS period.Among the 400 NPAs that were negative for SARS-CoV by RT-PCR, obtained during the SARS period in 2003, one was positive for RNA of CoV-HKU1. The NPA was obtained from a 35-year-old, previously healthy woman with pneumonia of unknown etiology in March 2003, 10 months earlier than the index case. There was no direct relationship or contact between the two cases. The detection of several unique features upon sequencing confirmed the presence of CoV-HKU1. Sequencing of the 2,784-bp fragment that encodes Pol revealed 87 base (3.1%) and seven (0.8%) amino acid differences between the Pol of this virus and that of the virus from the index patient. Sequencing of the fragment that encodes nsp1 showed that 11 ATR are present, compared to 14 ATR in the fragment from the index patient (Fig. 3). This indicates that the ATR is probably a consistent feature in nsp1 of CoV-HKU1 and may also be a region of frequent insertion and deletion. Sequencing of the replicase polyprotein/HE junction revealed that NS2a, absent from the virus of the index patient, is also absent from this virus. The amount of CoV-HKU1 RNA in the NPA was 1.13 × 106 copies per ml. Since the convalescent-phase serum is not available from this patient, antibody response cannot be determined.

DISCUSSION

We report the characterization and complete genome sequence of a novel coronavirus detected in the NPAs of patients with pneumonia. The clinical significance of the virus in the index patient was made evident by the high viral loads in the patient's NPAs during the first week of his illness, which coincided with his acute symptoms. The viral load decreased during the second week of the illness and was undetectable in the third week. In addition, the fall in viral load was accompanied by the recovery from the illness and development of a specific antibody response to the recombinant N protein of the virus. The fact that the present virus could not be recovered from cell cultures could be related to the lack of a susceptible cell line for CoV-HKU1 or the inherently low recovery rate of some coronaviruses. Many decades after the recognition of HCoV-229E and HCoV-OC43, the other non-SARS human respiratory coronaviruses known to cause pneumonia at low frequencies (27, 35, 40), there are still only a few primary virus isolates available, and organ culture is required for primary isolation of HCoV-OC43. In our experience, SARS-CoV can be recovered only from less than 20% of patients with serologically and RT-PCR-documented SARS-CoV pneumonia. After the discovery of CoV-HKU1 in the index patient, we conducted a preliminary study on 400 NPAs that were collected last year during the SARS period. Among these 400 NPAs, CoV-HKU1 was detected in one specimen, with a viral load comparable to that of the index patient. These results suggested that CoV-HKU1 is not only an incidental finding in an isolated patient but a previously unrecognized coronavirus associated with pneumonia.

Genomic analysis reveals that CoV-HKU1 is a group 2 coronavirus. The genome organization of CoV-HKU1 concurs with those of other coronaviruses, with the characteristic gene order 5′-replicase, S, E, M, N-3′, short untranslated regions in both 5′ and 3′ ends, 5′ conserved coronavirus core leader sequence, putative TRS upstream of multiple ORFs, and conserved pseudoknot in the 3′ untranslated region. CoV-HKU1 contains certain features that are characteristic of group 2 coronaviruses, including the presence of HE, ORF 4, and N2. Phylogenetic analysis of the 3CLpro, Pol, helicase, S, E, M, and N proteins showed that these genes of CoV-HKU1 were clustered with the corresponding genes in other group 2 coronaviruses. However, the proteins of CoV-HKU1 formed distinct branches in the phylogenetic trees, indicating that CoV-HKU1 is a distinct member within the group and is not very closely related to any other known members of group 2 coronaviruses (Fig. 2).

CoV-HKU1 exhibits additional features that are distinct from those of other group 2 coronaviruses. Compared to other group 2 coronaviruses, there is a deletion of about 800 bp between the replicase ORF 1b and the HE ORF in CoV-HKU1. In other group 2 coronaviruses, including MHV, SDAV, HCoV-OC43, and BCoV, an ORF of 798 to 837 bp (273 to 278 amino acids) is present between the replicase ORF 1b and the HE ORF. This ORF encodes protein of the coronavirus nonstructural protein NS2a family (PFAM accession no. PF05213
). Further experiments will reveal if this is a nonessential gene in other coronaviruses, as in MHV (30), and if it serves virus-specific functions in different group 2 coronaviruses. In addition to the deletion, upstream to PL1pro in ORF 1a, there are 14 tandem copies of a 30-base repeat that codes for a highly acidic domain. Similar repeats, with different amino acid compositions, have been found in the genomes of human, rat, and parasites but not in other coronaviruses (22, 47). The function of these repeats is not well understood, although some authors have suggested that they could be important antigens, and their biological role may be related to their special three-dimensional structure. The vitellaria antigenic protein of Clonorchis sinensis contains 23 tandem copies of a 30-bp repeat that codes for DGGAQPPKSG (47). In the case of Plasmodium falciparum, it has been shown that the antigenicity of the circumsporozoite protein is due to its repeating epitope structure (22). It has also been suggested that the tandemly repeated peptide may induce a strong humoral immune response in the infected host and thus may also be useful in serological diagnosis. Further experiments should be performed to delineate the antigenic properties, biological role, and possible clinical usefulness of this tandem repeat in CoV-HKU1.

The prevalence of CoV-HKU1 in humans as a cause of respiratory tract infections remains to be determined. HCoV-OC43, HCoV-229E, and probably HCoV-NL63 are endemic in humans. On the other hand, isolation of SARS-CoV-like coronavirus from civet cats and the absence of a resurgent SARS epidemic in 2004 apart from sporadic laboratory-acquired cases imply that SARS-CoV probably originated from animals. For CoV-HKU1, the detection of its existence in the NPAs of two patients almost 1 year apart suggests that it may have been endemic in humans, or alternatively, it may originally have been an animal coronavirus but may have crossed the species barrier in the past few years. In the serological experiments, Western blot analysis revealed that the serum samples of the two healthy blood donors showed some antigen-antibody reaction with the purified N protein of CoV-HKU1 (Fig. 6). It is not known whether these were due to cross-reaction between the N protein of CoV-HKU1 and that of HCoV-OC43, since these two proteins showed 58% amino acid identity, or due to past infections by CoV-HKU1. Further clinical, seroepidemiological, and phylogenetic studies would be required to determine the relative importance of CoV-HKU1 compared to other respiratory tract viruses in causing upper and lower respiratory tract infections, its seroprevalence, and the origin of the virus.

ACKNOWLEDGMENTS

This work was partly supported by the Research Grant Council Grant, Research Fund for the Control of Infectious Diseases, “One Mouth, One Mask” Fund, Suen Chi Sun Charitable Foundation, the Public Health Research Grant AI95357 from the National Institutes of Allergy and Infectious Diseases, and the William Benter Infectious Disease Fund.

Klausegger, A., B. Strobl, G. Regl, and A. Kaser.1999. Identification of a coronavirus hemagglutinin-esterase with a substrate specificity different from those of influenza C virus and bovine coronavirus. J. Virol.73:3737-3743.