About 18,545 years ago, give or take a few decades, a woolly mammoth died. Succumbing to causes unknown, the creature was buried in Siberian snow. Many other mammoths must have met similar fates but this one, which we now know as M4, is special. Almost 20 millennia later, its beautifully preserved remains were unearthed by scientists who have revealed both its body and its genetic code. For the first time, the genome of an extinct species has been sequenced almost to completion.

Webb Miller from Pennsylvania State University together with a large team of American and Russian scientists has just published about 70% of the full mammoth genome. Currently, about 3.3 billion of those base pairs are known and Miller’s group estimate that the full sequence would weigh in at about 4.7 billion base pairs, making it fairly… well… mammoth in size. If the estimate is right, the mammoth genome was about 40% larger than a human’s but about the same size as a modern elephant’s.

So we don’t have a complete picture yet, but it’s a major technical advance nonetheless. Sequencing ancient genomes is no easy task and just a few years ago, it would have been little more than a flight of fancy. The obstacles were numerous – traces of ancient DNA are hard to come by; when they are extracted, they are broken into tiny fragments and swamped by DNA from nearby bacteria and fungi; and the sequencing technology at the time simply wasn’t fast enough.

The first two problems were actually solved by nature thousands of years ago. While fossilisation does little for preserving DNA, the freezing process that many mammoth carcasses were subjected to was much kinder. It safeguarded their hair, a rich source of DNA that is well protected from the damaging elements and the contaminating genes of microbes.

The final technological hurdle was leapt in 2005, with the advent of a new technique called 454 sequencing that was 100 to 1,000 times faster than the favoured method of the time. In the three years since, the method has become five times faster still and can now handle billions of base pairs in a single run, allowing individual laboratories to sequence in months what international collaborations used to take years to accomplish.

Miller’s group leapt onto the new tool and mere months after the technology was available, they had used it to sequence about 13 million base pairs of mammoth DNA, about 3% of its full genome. The team went on to sequence the DNA of the mammoth’s mitochondria – small structures inside complex cells that contains its own mini-genome and accounts for just 13 of the mammoth’s thousands of genes. It was an important step, but small potatoes compared to the much bigger task of sequencing its nuclear genome.

To do that, the team turned to M4 and another specimen called M25 (English readers will chuckle at awarding that name to a permanently frozen, unmoving giant), and sequenced over 4 billion base pairs between the two individuals.

Accuracy was a big issue.After all, 454 sequencing is a “shotgun” technique, meaning that it simultaneously analyses many small chunks of DNA and pieces the data together later. It’s fast, but error-prone and every genome needs to be sequenced many times over to get a reliable draft. But with such a blisteringly fast technique and so many hair samples, that wasn’t a problem. The mammoths provided so much hair that the team could sift for long, well-preserved chunks of DNA that would give better readings than the usual short fragments.

To get a benchmark for accuracy, Miller also analysed the mitochondrial DNA from these new specimens and compared it to their previous accurate sequence. He estimates that the new readings have a small error rate of about one in 700. He also compared the mammoth genome with the draft version of the African savannah elephant genome, to discriminate sequences that came from contaminating microbes from those that actually belonged to the mammoth. On average, about 80% of the readouts were definitely mammoth in origin and M4 in particular provided excellent samples.

To finish the job and publish a complete and accurate genome, Miller needs the final elephant version and about 10-30 times more mammoth sequences. For the moment, the combined nuclear and mitochondrial sequences tell us a bit more about elephant evolution. They suggest that the elephant family had split down three different paths about 7 million years ago, one leading to the small Asian elephant, one that begat two species of African elephants and one that led to the mammoths. And about 1.5 million years ago, the woolly mammoth had split into two genetically distinct sub-populations, as represented by M4 and M25 today.

During this process, the different groups were evolving very slowly at a genomic level. Genetically, the modern elephant is 99.4% identical to its extinct woolly cousin. In comparison, humans and chimps have about twice as many differences in our DNA, even though our lineages diverged from each other at about the same time as the two elephant species did.

The similar DNA would have given rise to similar proteins and Miller estimates that the average mammoth protein would have differed from its elephant counterpart by only one amino acid across its entire length. But these small differences may have had large effects. After all, Miller found many cases where the mammoth had unique and unusual amino acids at positions that are otherwise exactly the same in about 50 other species of back-boned animals. These changes could have contributed to important changes in mammoth evolution, such as their adaptation to extremely cold climates.

The complete genome will hopefully tell us more about these adaptations, and potentially give us clues as to why the mammoth went extinct and just how many species there were. With such data, we can use genetic information, as well as fossils, to answer questions about the life of this majestic species. We could even potentially bring one back to life.

Cynics might argue that the mammoth genome, in its current unfinished state, tells us very little, save that it is possible to eventually sequence the whole lot. And while that’s certainly true, it surely is amazing in itself. Miller’s work provides a stark demonstration that the genomes of creatures long dead are not beyond the reach of our understanding. Sequencing extinct genomes is not only possible, but it can be done in a very short space of time by few scientists working on comparatively small budgets. The full Neanderthal genome is surely next.

Comments (6)

Wow. I had not idea gene sequencing had come so far. Of course I had Jurrasic Park in my head as I read this.
When you said that a Neanderthal’s genome was also in reach, that made me stop and imagine the implications.
I wonder if we will eventually try to bring these to life? In a way, I kinda hope not. Maybe man wasn’t meant to know everything.

Well one line of reasoning for cloning a mammoth (and there’s a great feature in Nature on how we could go about doing this) is that it would be awesome, in the literal rather than the internet sense of the word. It would stimulate a fantastic amount of excitement, particularly among the younger generation, and that’s certainly no bad thing for science. I suspect the same argument would be infinitely harder to make for Neanderthals. For a start, who would you use as a host?

-yes, webb miller and his team have obtained 3,4Gb out of the 4,7Gb estimated mammoth genome size but, as far as i understand,it doesn’make 70% sequencing because in these shot-gun sequencing techniques part of the genome is sequenced many times.
-the actual percentage has been evaluated to 50% when considering, for instance, the fraction of african elephant ultraconserved sequences found in the mammoth genome.
thank you very much for your sc blog, I wonder how you have the time to do so many things!!
françoise ibarrondo, a biology high school teacher in paris.

Yes, Webb Miller and his team have obtained 3,4Gb out of the 4,7Gb estimated mammoth genome size but, as far I understand, it doesn’t make 70% sequencing because, in these shot-gun sequencing techniques, part of the genome is sequenced many times.
The actual percentage has been evaluated to 50% when considering, for instance, the fraction of african elephant ultraconserved sequences found in the mammoth genome.
Thank you very much for your Science blog ; I wonder you have the time to do so many things !
Françoise,biology high school teacher in Paris, France