Genome hints humans, Neanderthals rolled in prehistoric hay

Behold: the first draft of the Neanderthal genome! It comes complete with …

This news has been in the works for several years, from the first reports of gene sequences in 2006 to a public overview at last year's AAAS meeting. Today's issue of Science now contains the final product: the first draft sequence of one of our recent human relatives, the Neanderthal.

There's lots of information here for the genomics obsessive, which we'll get to later, but there's also some generally significant findings, including evidence that the ancestors of some human populations interbred with Neanderthals.

The DNA used for the sequencing comes primarily from three bone fragments—described as being "of little morphological value"—from a cave in Croatia. Even so, the DNA was extracted from bone powder produced with a small dental drill. The authors suggest that, as long as minor damage can be tolerated, it's worth trying to get DNA out of bones.

The paper confirms earlier reports that initial sequences were contaminated with DNA from modern humans, and describes the extensive quality controls that were put in place to limit the new data to Neanderthal.

Various measures indicate that the new sequence that's being released is over 99 percent Neanderthal. In total, it's about 5.3 Gigabases worth of sequence, meaning that every base in the Neanderthal genome was sequenced, on average, 1.3 times. Obviously, the distribution is not that even, and many areas remain unsequenced.

What we do have confirms that the three bones come from distinct individuals, although two may have been related, as their mitochondrial DNA sequences were identical. The primary samples were supplemented with additional sequences from bones found in Spain, the Caucasus, and the initial skeleton from the Neander valley.

The similarity of the sequences across all these samples indicates that the species was about equally distinct from humans across most of its geographic range. For the most part, Neanderthals looked like humans. Ninety percent of the time that there was a difference between human and chimp DNA sequences, the human variant showed up in Neanderthals.

What about the differences? When it comes to protein coding genes, they're pretty minor. Only 78 differences in the sequences that encode proteins are uniformly present in humans but absent in Neanderthals. Only five of those would change the primary structure of the protein.

The similarity is even more striking when you look at copy number variations, which can alter the dose of genes. Neanderthals have lots of CNVs, but all but three of them show up in other primates, and the three exceptions don't contain any genes.

Much of the action instead seems to be happening in areas that may regulate the expression of genes. There were over 230 changes apparent in the parts of genes that flank the protein-coding section (the 5' and 3' UTRs). In the areas that have been identified as Human Accelerated Regions (HARs) based on the large differences between humans and chimps, the Neanderthals had the human form 90 percent of the time, but that still left 45 HARs in which humans have picked up significant differences since they diverged from Neanderthals.

Sweeping genes and interbreeding

The authors also went looking for cases where there was evidence of what's termed a selective sweep, where a useful mutation occurs and spreads through the population, dragging its area of the chromosome along with it. We can detect these by looking for large chunks of the chromosome that are essentially identical in modern humans, but differ from the Neanderthal versions.

Researchers found over 200 of these. Many of them appear to contain genes involved with neural development, including DVRK1A (implicated in Down Syndrome), Neuregulin-3 (schizophrenia), and CADPS2 and AUTS2 (autism). The authors also point out RUNX2 is part of a selective sweep; mutations in this gene lead to skeletal deformities in the face and shoulders, areas which differ significantly between humans and Neanderthals.

Of the largest 20 regions that have swept through the genomes of modern humans, five don't even contain genes. Given that some of the key differences in the other 15 probably lie outside the coding regions of the genes they contain, the data implicates gene regulation as a significant driver of the evolutionary adaptation in humans.

Those of you who followed the selective sweep link above saw that it was part one of a two-article description, laying out evidence that one gene involved in brain development swept through humans after we picked up the Neanderthal copy through interbreeding. We've now got the Neanderthal sequence of that gene, and it doesn't look like the human version, so that hypothesis just took a major hit.

At the same time, the genome sequence does provide evidence that humans and Neanderthals have interbred. This became apparent when the Neanderthal genome was paired against human genomes from different parts of the globe. The Neanderthal DNA consistently matched European and Asian samples better than it did African; the difference was small, but consistent. It suggested that the Neanderthals, which were restricted to Europe and Asia at the time modern humans originated in Africa, had interbred with humans once they began migrating out of Africa.

Because African human populations are older, they tend to have more divergent genomes. But the human-Neanderthal split is older still, so the authors figured that any areas of the genome where variation was larger in populations outside of Africa may have entered the human genome through interbreeding. If they did arise through interbreeding, then the non-African segments should match Neanderthals. Researchers found at least 10 regions that fit these predictions.

Although they can't rule out the possibility that modern humans had already started diverging from Neanderthals before leaving Africa, the research team favors the idea of interbreeding in the Mid-East as the first modern humans left Africa. This would ensure that both the Asian and European populations picked up some Neanderthal DNA.

Some technical details

Perhaps the most amazing part of the work is that the authors wanted to have a bit more perspective on the geographic differences among modern human populations, so they just did a draft of five more human genomes in order to get it. That's where the technology is at now—if it's helpful to have another genome, you can just go out and get it. They also took advantage of some of the other genomes that have been done recently, including Craig Venter's. That leads to some rather unusual phrasing, like "present-day African segments with the lowest divergence to Neandertals have a divergence to Venter that is 35 percent."

The other indication of the capacity of modern sequencing machines comes from the fact that there simply wasn't enough Neanderthal DNA to completely fill a machine's capacity. The team had to attach short sequences that acted as barcodes to identify the Neanderthal DNA, and ran it at the same time as other sequencing work in order to avoid wasting time and money by performing a run at less than full capacity.

The Max Planck Institute has also developed a base calling algorithm, called Ibis, that uses machine learning to help it deal with the sorts of damage that is typical in ancient DNA samples. These can be pretty substantial, too; if a Cytosine appears at the end of a fragment, 40 percent of the time damage had converted it into a thymidine. That number dropped the further from the end you got, while other types of damage showed different profiles. Accounting for all of this is anything but simple, so it's no surprise that software plays a major role.

Contamination is still a problem—most of the sequence that comes out is still bacterial—but the authors did pilot runs, and only worked with samples that produced a mix of sequences that was greater than 1.5 percent hominin. To avoid having most of that be human, they did two checks. The first was to assemble the mitochondrial genomes that came out of the data and ensure that those genomes matched Neanderthal, rather than human, samples. The second was to take advantage of the fact that their samples seemed to be females; any sign of a Y-chromosome was an indication of contamination.

There's another paper in this edition of Science that indicates there's another way of dealing with contamination: affinity purification. We know that most Neanderthal sequences are identical to humans, so we can just use DNA chips containing human sequences to purify Neanderthal DNA by affinity. Right now, we can't make a chip big enough to host the entire human genome (much less both strands of the double helix for the whole genome), so it's limited to fishing out specific stretches of the genome.

In this particular paper, the authors focused on purifying the specific places in protein-coding DNA that contain differences between humans and chimps. By having the Neanderthal sequence as well, we can determine how many of these are unique to modern humans.