Sunday, March 06, 2016

I went back to school this semester. Thanks to the generosity of the citizens of NM and the misplaced priorities of the State Legislature, it is very cheap for old geezers (like your humble correspondent) to take university classes. One course I am taking is Human Genomics (AKA, BIOL 550), which is fundamentally a reading course wherein we read a lot genomics papers, mostly from the last 2-5 years. The amount of progress in this area recently has been phenomenal, driven mostly by the rapid fall in the cost of sequencing genetic data but also by the development of powerful statistical algorithms that permit analysis of the enormous quantities of data.

The human genome, like those of all* the other life on Earth, is encoded in DNA molecules, enormously long chains of four different nucleotides. The human genome, for example, consists of about three billion of these, each of which is paired with a complementary nucleotide in a complementary chain. Together, these chains form the famous double helix. The most famous and central property of these chains is that they include long sections of code, with each three base pair set encoding for an amino acid of a protein product.

Most of the information on a genome is in the form of the sequence of these base pairs, so that if you know the order of all three billion bases on the DNA strands, you know almost everything about the genetics of the organism (there is also some so-called epigenetic stuff, but I won't get into that). So how do you go about determining the sequence of three billion molecules in these tiny chains? It's complicated, but it usually involves several steps, including multiplying the DNA, cutting up into more manageable sized pieces, sequencing the pieces, and figuring out how to virtually glue all the now sequenced pieces back together. The details are managed by micro-machines, designed by people, and nano-machines, designed by bacteria and adapted by humans.

Sequencing the first human genome cost a billion dollars or so, but since then the cost has plummeted by roughly six orders of magnitude. We have gradually acquired sequences for several thousand modern humans from all over the world as well as sequence data for many of our relatives close (chimps, gorillas, monkeys) and distant (yeasts and bacteria). For the student of human prehistory, it is even more interesting that DNA from some ancient dead has also been sequenced, including some Neandertals and a Denisovan, which split from modern humans perhaps half a million years ago.

All this new data is telling us a dramatic tale of repeated migrations and admixture events. The people who occupy a location now are rarely those who got their first. The story is probably known best for Europeans, who seem to have left Africa about 50 thousand years ago, separated from modern East Asians 10 or 15 thousand years later, and mixed slightly with the pre-existing Neandertal population before replacing them. These Paleolithic hunter-gatherers were mixed with and largely replaced by Neolithic farmers from the Middle East some 7-9 thousand years ago. A couple of thousand years later, the Yamnaya culture with horses, chariots and bronze showed up, sweeping away most of the previous population and probably bring the Indo-Eropean languages.