Can a reading frame code for more than one protein? Are short proteins more common than previously thought? The results from a new viral genomics study suggest so.

Similar to the way detectives profile a criminal, scientists have sketched the
complete set of proteins coded by the human cytomegalovirus (HCMV) genome,
finding templates for hundreds of previously unidentified proteins.

Using ribosome profiling and mass spectrometry, scientists profiled HCMV’s proteome when it infects a human fibroblast cell like the one pictured here. Credit: Glyn Nelson

"A starting point for understanding and studying any virus is to identify
the full set of viral gene products," said study author Noam
Stern-Ginossar, a post-doctoral researcher at University of California, San
Francisco. "Our studies establish a paradigm for mapping and unbiasedly
deciphering complex genomes."

That paradigm is the technique that she and an international team of
scientists used to experimentally decode the proteome of HCMV. In the
experiment, the researchers infected human foreskin fibroblast cells with
the virus, then mapped the positions of ribosomes—cellular organelles where
proteins are synthesized—in RNA fragments of the cells. The scientists also
performed mass spectrometry to confirm some of the newly discovered
proteins. The method, described in the November 23 issue of Science (1),
could allow scientists to profile other complex viruses and show exactly how
each one hijacks its host cells.

"The novelty of our approach is that it is experimentally based and does
not rely on any assumptions or predictions," said Stern-Ginossar.

Although scientists decoded the 240,000 base-pair genome of HCMV more than 20
years ago, information about the virus’s protein coding potential came
mainly from sequence-based, informatics, and computer modeling. While HCMV
infects most humans and is usually harmless, it can cause disease in
newborns and in adults with weakened immune systems.

To understand how it hijacks healthy cells, Stern-Ginossar and colleagues used
deep sequencing of ribosome-protected mRNA fragments. The process identifies
the precise locations of the ribosomes on each mRNA and, for the first time,
it experimentally and systematically determined all protein-coding regions
of the HCMV genome.

The technique revealed templates, or open reading frames, for hundreds of
previously unidentified proteins. Overall, the scientists were surprised to
find that the open reading frames could encode more than one protein and
that they generated really short proteins, a few contained less than 100
amino acids. The results suggest that short proteins may be more common than
previously thought. These details all factor into HCMV’s infection profile
and give scientists clues to its behavior in the body.

The biggest challenge was to “establish the accuracy and robustness of the new
approach,” said Stern-Ginossar. The team demonstrated that the coding
regions identified had characteristic features for protein production. "More
importantly we used high-resolution mass spectrometric measurements on the
virally infected cells to independently confirm the accumulation of a
significant fraction of the novel proteins we have identified," she
said.

With the combined information, the group is beginning to understand how HCMV
infects and manipulates its host cells. The team may also use the data to
develop an effective immune response to combat HCMV and possibly to profile
other viruses.