The One Percent Difference

ByKevin Davies

Dec. 2006/Jan. 2007 | It didn’t help that the story broke on Thanksgiving Day, or that the international team of researchers featured only a few Americans. By contrast, the news was splashed over the front page of at least one British broadsheet. And appropriately so, given that the research reveals a shocking new layer of human genome variation with profound implications for the future of genomic analysis and personalized medicine.

The report in question, published last month in Nature, would surely rank atop my list of scientific highlights for 2006. A team from the Wellcome Trust Sanger Institute in Cambridge, U.K., and the Hospital for Sick Children in Toronto, Canada, together with colleagues in Spain, Japan, and the U.S., uncovered a stunning genome-wide sea of variation in segments of DNA larger than 1,000 bases. These so-called copy number variations (CNVs) can be deleted, duplicated, or inverted from person to person. They were catalogued following a detailed analysis of the 270 HapMap DNA samples using two methods — comparative intensity analysis using Affymetrix 500K arrays and comparative genome hybridization using GE Healthcare Codelink arrays.

The results are quite remarkable, especially since we had all assumed that the human genome project was completed three years ago! All told, the researchers identified 1,477 CNVs, which, if laid end-to-end, would encompass 12 percent, or 360 million bases, of the human genome. These CNVs directly involve 2,900 genes, including 15 percent of currently known disease-related genes.

One of the study’s principal authors, Toronto’s Stephen Scherer, says he was so shocked by the sheer quantity of CNVs that his group spent six months double checking the data before sharing with colleagues. He is already searching for CNVs at higher resolution to create a second-generation map and a more complete database (see http://projects.tcag.ca/variation).

30 Million ChangesFor the past few years, we’ve heard how unrelated humans differ at a mere 3 million bases, rendering them 99.9 percent identical at the genetic level. Many a genetics lecture includes the classic Annie Liebowitz photograph of Willie Shoemaker and Wilt Chamberlain, illustrating the remarkable range of human phenotypic variation. On some level, it seemed hard to attribute such major differences to just 0.1 percent of our DNA. Now we know — it doesn’t.

In a companion paper in Nature Genetics, Scherer’s team presents a detailed comparison of the only two previously published human genomes — the Celera sequence (largely that of former president J. Craig Venter) and the international consortium reference sequence (a composite). “The idea,” Scherer says, “was to come up with a good understanding of what we’re going to get when we do [personalized sequencing].”

Using MegaBLAST to align the genomes and the new Genome Comparison Algorithm to score the variants, Scherer and coworkers found a total of some 30 million base differences between the two sequences. These include roughly 1.5 million single nucleotide polymorphisms (SNPs), 24 million bases of unmatched sequence, 3.5 million of multi-copy sequence, and 1 million bases in inverted sequence. By this calculation, one could argue that humans are actually only 99 percent identical at the DNA level.

The researchers have found hints that CNVs could be implicated in schizophrenia, atherosclerosis, cataracts, and other diseases. Moreover, CNVs could play a huge role in the field of pharmacogenomics, shedding light on drug response variation. The authors note that, “CNV assessment should now become standard in the design of all studies of the genetic basis of phenotypic variation, including disease susceptibility.” A simple SNP test won’t capture all of the newly appreciated genome variability. With Illumina validating this field with its $600 million takeover of Solexa, and GE Healthcare jumping in, the future for next-generation sequencing technologies has just taken another major leap.

Meanwhile, Scherer is pondering the implications of his findings: “If you have 1 million fewer nucleotides than your buddy, shouldn’t you get a break on your golf handicap?”