October 28, 2010

1000 Genomes Project has arrived

John Hawks covers it (it's open access). But, here is an interesting tidbit from the supplement. Onto it Y-chromosome sleuths!

14.4. Y chromosome Haplogroups

A maximum likelihood haplogroup tree under a HKY model of evolution was produced using phyML, and bootstrap values were produced using 100 subsamplings. Trees were produced using both all 2870 filtered sites (Supplementary Figure 7), and the 1971 UYR sites; though there was very little difference between the two trees. The haplogroup tree classifies all the major haplogroups as monomorphic, and recovers the relationships between them, with high bootstrap confidence. It also shows evidence for a deep division between haplogroups DE and CT, previously identified only by a single marker (P143; Karafet, Mendez et al. 2008). New insights into recent human evolution can also be gained from the branch lengths; for example, the short internal branch lengths within the haplogroup R1b relative to the other haplogroups suggest a recent expansion of this European haplogroup (Balaresque, Bowden et al. 2010).

Nature 467 , 1061–1073 (28 October 2010) doi:10.1038/nature09534

A map of human genome variation from population-scale sequencing

The 1000 Genomes Project Consortium

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother–father–child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

It also shows evidence for a deep division between haplogroups DE and CT, previously identified only by a single marker (P143; Karafet, Mendez et al. 2008).

Given that the DE split and the CF split pretty have to date from sometime around an Out of Africa migration, as we infer from the distribution of Y-DNA DNA C, D, E, and F haplotypes respectively, a deep split between DE and CF would seem suggestive of the structure in the pre-Out of Africa African population. Presumably this means that the DE branch and the CT branch of the tree must have split from each other many tens of thousands of years before Out of Africa, and significantly earlier than prior estimates of a split at about 68,500 years ago based upon mutation clocks by Underhill, et al in 2008 made without the 1000 Genomes data.

Were CF and DE different populations at some point? Did they merge and leave Africa together, or were they part of two separate Out of Africa waves, both prior to 50,000 years ago, given the presence of C descendant Y-DNA in Australia and New Guinea, and D descendant Y-DNA in the Andaman Islands and in Jomon Japan.

I agree - D certainly has a weird distribution. It most likely left Africa some time before the DE split, but did not leave any trace anywhere until it survived for a long time in small numbers and then diverged in Asia. So, in principle, this could be a candidate for an earlier OOA, if there was one.

I hate working statistics or deductions with small numbers, but from the fossil record - if we take Zhirendong at face value - it would match the fact that early AMHs died out in the Near East and left no trace until eastern Asia (and perhaps India).

Also, this would fit well with the very low mutation rate found in the 1000 Genomes paper.

Some of the most isolated D carriers need to be included in full-genome studies!

The evidence from the Andamans, the Ainu and the Paleo-layer of Tibetan genetic variation is suggestive of the idea that Y-DNA type D was accompanied in all places by mtDNA haplotype M. Two isolate subhaplotypes of M are found in the Andaman, there is a likely source for the non-haplotype M mtDNA in the Ainu, and there is suggestive evidence that M16 was the modal mtDNA type of the Paleotibetans.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.