Probabilistic model could provide insights into what makes a human a human

A new computational tool will potentially help geneticists to better understand what makes a human a human, or how to differentiate species in general, by providing more detailed comparative information about genome function.

The research team, including scientists from the University of Virginia, Florida State University and the University of Connecticut, developed the Phylogenetic Hidden Markov Gaussian Processes model, or Phylo-HMGP, to analyze functional genomic data. They used the model to analyze a new dataset for DNA replication timing across five primate species, including human.

A new algorithm, Phylo-HMGP, is used to compare how replication timing — the order in which DNA segments are replicated — differs among five species of primates. The five tracks represent the different values from replication timing experiments for each species and the color bars represent different evolutionary patterns of DNA replication timing.

Genetic differences in protein-coding genes alone cannot account for the dramatic variation between species, so scientists increasingly focus on differences in gene regulation — mechanisms that control how and to what degree genes are activated.

"The differences among primate species may be mostly in the noncoding regions of the genome, the regulatory elements, not the genes themselves," Ma explained. High-throughput technologies produce a large amount of functional genomic data, which should help scientists better understand how genomes evolved.

Ma said Phylo-HMGP addresses what might be called the "Starbucks problem" in these multi-species analyses. Just as coffee vendors tend to sell drinks in small, medium and large sizes, analysis tools typically characterize functional genomic data as low, medium or high.

"With Phylo-HMGP, we can look at each functional genomic value as a continuous signal — showing the actual activity level, rather than just a rough level estimate," said Yang Yang, a Ph.D. student in CMU's Computational Biology Department and first author of the study. "In this way, we're able to fully utilize the data that have been gathered."

The researchers applied the model to an analysis of DNA replication timing, the chronological order in which segments of DNA are replicated, which can vary from species to species. They did so for a dataset including humans, chimpanzees, orangutans, gibbons and green monkeys that was generated in collaboration with David M. Gilbert of Florida State University and Rachel J. O'Neill of the University of Connecticut.

"We demonstrated that we could use Phylo-HMGP to discover genomic regions with distinct evolutionary patterns of replication timing," Ma said. Their research provides a framework for applying the model to reveal genomic regions with functions that are similar across species and those that varied, or dynamic, between species. Analyses of dynamic regions in functional genomic datasets not only can improve understanding of evolution, but also may have implications for certain types of species-specific diseases, he added.

Other research team members include Yang Zhang, a research associate in the Computational Biology Department; Quanquan Gu of the University of Virginia; Takayo Sasaki of Florida State; and Julianna Crivello of the University of Connecticut. The National Institutes of Health and the National Science Foundation supported this research.