Archive for February, 2011

David of the Eurogenes Genetic Ancestry Project has a cautionary post up, When is a genetic map also a geographic map? Always and never. In it, he uses a specific peculiar pattern as a launching point into a broader exploration of the relationship between visualizations of genetic variation, and geography. That pattern is that Russians, the most geographically furthest east of European peoples, are closer to the Slavs of Central Europe than the Balts when plotted on the two largest dimensions of variation. I’ve highlighted this pattern from a PCA David extracted from a paper on northeast European genetics. This disjunction between geography and genetics has a pretty straightforward possible explanation: the current distribution of Russian-speaking peoples is a function of a massive demographic expansion to the east by Slavic farmers within the last 2,000 years. We already know that the borderlands between the steppe and the forest were long dominated by North Iranian people, from the Scythians to the Sarmatians, while further north the Great Russians absorbed a Finnic substrate (clear because some of the absorption is attested down to the early modern period).Read More

Should you go to an Ivy League School, Part II. I think the value of an Ivy League degree will be more, not less, important in the future. It seems possible that we’re nearing the end of the age when the wage gap between unskilled and skilled workers is relatively modest (roughly, the wage gap decreased between 1800 up to 1970, and has been increasing over the past 40 years). Credentialing and finding juicy rents and sinecures is probably the way to go in the future. As the past was, the future shall be?

Advanced Degrees Add Up to Lower Blood Pressure. I’m sure that the paper itself is less irritating in terms of conflating correlation and causation. The problem is that it is the least intelligent people who will think that extra years of education = extra years of life in a magical manner. That being said, peer group effects probably matter, so I suspect that that’s part of what’s going on here after you correct for background variables.

Yesterday Michelle decided to put up a post with her own analysis of her ADMIXTURE results. With that in mind, I thought I’d revisit some results from my parents. After many runs of ADMIXTURE, both by myself and Zack, some consistent differences seem to crop up. To review, one of the big surprises from genotyping my parents is that both of them have about the same “East Asian” element of ancestry which is very distinctive from the conventional South Asian mix. Because both of my parents lack any oral history of recent admixture I posited that this element may be a uniform substrate common among eastern Bengalis, and that it was absorbed during the initial period of settlement and demographic expansion on the frontier in the period between 1000-1500 A.D. By analogy, low levels of Amerindian admixture persist across Brazilians, and African admixture among Mexicans, but because the admixture dates back several hundred years it does not seem to have percolated down to the present in oral history (though some old stock Brazilians of predominantly Portuguese origin have been able to infer Amerindian ancestry by looking at the church marriage records of their ancestors, and adducing that some women were natives due to common baptismal names given to such converts).

Since ADMIXTURE is sensitive to the genetic variance you throw into it in extracting out patterns, I created two pools with my parents in it. One was predominantly West Eurasian, and another was predominantly East Eurasian. In both samples my parents were Bengali A (father) and Bengali B (mother), and I included in the Gujarati_B and Pathan South Asian populations. Gujarati_B because it seems particularly South Asian, and therefore informative. The Pathan sample has less African admixture than the Sindhis or Makranis, and is not so isolated as the Kalash, Brahui, Burusho, and Baloch. For the East Eurasian sample I included Sardinians as the West Eurasian outgroup, while for the West Eurasians I included Japanese as the outgroup. Finally, I pruned the markers down to 65,000 SNPs. Below I report K = 6, as cross-validation determined that to be the optimal value for the number of populations.

I’ve only become aware of “content farms” in any significant way over the past few days. Yes, I’m aware of Associated Content and eHow. I use Google! But I’ve always ignored them. But with Google’s turn against these websites I’ve become curious. This Wired piece from October 2009 is a gem. Here’s the part that caught my attention:

Plenty of other companies — About.com, Mahalo, Answers.com — have tried to corner the market in arcane online advice. But none has gone about it as aggressively, scientifically, and single-mindedly as Demand. Pieces are not dreamed up by trained editors nor commissioned based on submitted questions. Instead they are assigned by an algorithm, which mines nearly a terabyte of search data, Internet traffic patterns, and keyword rates to determine what users want to know and how much advertisers will pay to appear next to the answers.

In some ways “mainstream” websites also do this a bit, Nick Denton relies on fine-grained metrics for his Gawker Media properties. But obviously the sort of thing that content farms do, responding so specifically to the interests of the audience, take it to the next level. I started browsing some of the “articles” produced by the contributors, and I think Farhad Manjoo has it right:

Long time reader Dragon Horse has been generating and collecting (top row images are from Dienekes) composite image of various classes of individuals for a while now. It’s really fun to just skim through and make your own assessments (the “global face” resembles darker skinned versions of Amerasians, whose fathers were white Americans and mothers Southeast Asian, to me).

The most well known composites are of nationalities, but he’s also generated and reposted composites of other classes. For example, the average Bollywood actressisAishwarya Rai. Not literally, but the resemblance is jaw-dropping (compare to the average Indian woman). But most interesting to me were the comparisons of American film actors, male and female, then and now (“Golden Age” vs. contemporary). I’m pretty sure you can pick out which one is which if you’re American. There seem to be two correlated trends here: 1) more feminine features for both males and females, and 2) more youthful features for both males and females. Correlated, because neoteny and masculinization seemed to generally push in opposite directions of trait value. Projecting in the future I assume that the Global Human Celebrity will converge upon a 14 year old girl?

Addendum: One difference between the “Golden Age” and modern celebrities is the attention to a rather buff physique. So though the actors of yore had more rugged faces, their physiques were often rather flabby in comparison to today’s leading men. So I might correct and assert that the future global celebrity will be a baby-faced 14 year old girl with abs to die for!

The Pith: Brazil is often portrayed as the second largest black nation in the world, after Nigeria. But it turns out that the majority of the ancestors for non-white Brazilians are European.

One of the more popular sources of search engine traffic to this website has to do with the population genomics of Latin America. For example, my post showing that Argentina is not quite as European a country as it likes to consider itself is regularly cited in online arguments (people of various “persuasions” are invested in the racial status of the Argentine people). But last week in PLoS ONE a paper looking at the patterns of ancestry in the Brazilian population came to a somewhat inverse conclusion as to the self-conception or perception of the preponderant racial identity of that nation. Let me quote from the conclusion of the paper:

Among the actions of the State in the sphere of race relations are initiatives aimed at strengthening racial identity, especially “Black identity” encompassing the sum of those self-categorized as Brown or Black in the censuses and government surveys. The argument that non-Whites constitute more than half of the population of the country has been routinely used in arguing for the introduction of public policies favoring the no-White population, especially in the areas of education (racial quotas for entrance to the universities), the labor market, access to land, and so on [36]. Nevertheless, our data presented here do not support such contention, since they show that, for instance, non-White individuals in the North, Northeast and Southeast have predominantly European ancestry and differing proportions of African and Amerindian ancestry.

The idea that Brazil is majority non-white, that is black, is one I’ve seen elsewhere. Using the American model of hypodescent, where children inherit the racial status of their most stigmatized ancestral component, no matter its magnitude, well over half of Brazilians are “black.” On the other hand, there’s the persistent trend in the recent analyses which show that black Brazilians have a much higher load of European ancestry than black Americans, while white Brazilians have a much higher load of Amerindian and African, than white Americans.

The American Medical Association has written a letter to the US Food and Drug Administration as part of the lead-up to the FDA’s meeting on direct-to-consumer (DTC) genetic testing next month. The tone is predictable: the medical establishment is outraged by the idea of people having access to their own genetic information without the supervision of its members, and they want the FDA to stop it….

Over the past six months I’ve gotten really into analyzing genotypes of friends & family. Sometimes I talk about this excitedly, and people worry about the “risks.” When I ask what risks they’re worried about, usually people offer the vague and content-free fear of “what you could find out.” First, if you have family information, that’s usually much more powerful than the “disease risk” estimates that these firms are giving you. In 99% of the cases, if that’s your primary concern it’s not worth the money. Second, if you’re terrified about what ancestry inference might tell you, probably you should see a shrink. You are what you are, and you’ve always been what you are. As a matter of common sense psychology, on the margin a change in self knowledge can have a big effect, but usually it is just informational icing on the cake.

I wouldn’t bet on any regulatory agency being able to clamp down on direct-to-consumer personal genomics for those who want to get it done at this point, though it is probably still possible if campaigners for F.U.D. get clever. If it’s banned in the United States no doubt the firms will move offshore (or new firms will crop up to fill the demand). Rather, it might have a dampening impact on the pace of innovation since there will be new impediments toward profitably. But here’s the important point, I’ve got the markers on several computers and in Gmail. Once the information is out, it’s out. There’s no way that the government can put the genie back in the bottle for those of us who have raced ahead of feared regulation. So run, just in case. Once you cross the threshold they can’t drag you back, no matter how powerful their lobbyists and marketers are.

Note: If you read this blog you know that I’m generally skeptical of the average person to interpret a mass of information. So in some ways F.U.D. pushers have a point. But, we live in a world of fad diets and all sorts of crazy movements. That’s a much bigger issue, and no one is pushing for regulation of that sort of thing.

One of the major issues in our world today is that we’re a people of specialties. This means that we don’t have basic interpretative frameworks in which to place novel facts. Because of the abstruse and formal nature of the discipline, this is probably starkest in the domain of science, but it is not restricted to only science. Consider geography. In many ways this is “low hanging” cognitive fruit in the shallow part of the learning curve which mostly consists of assembly of facts, but because of the shifts in emphases in American education geography has tended to get short shrift. This means that whenever there’s a foreign policy crisis middle-brow journals of record such as The New York Times have to commission pieces about nations such as Libya which read like a “first book” for six year olds on that nation (and on political weblogs commenters proudly brandish their “first book” level of knowledge).

But a bigger general issue seems to be in relation to climate. “Climate Change” is in the news constantly, but the average person on the street seems to have zero historical perspective on events such as the Medieval Warm Period, the Little Ice Age, let alone more obscure epochs such as the Younger Dryas. Fair enough, it isn’t as if Deep Time is ever going to be broadly interesting. But more disturbing to me is the total lack of perspective when it comes to current spatial patterns.

For example, a friend who has college degrees in history and philosophy, has traveled to Europe, Canada, and is planning a trip to Thailand and the Philippines, thought China was further to the north than Europe. Take a look at this map:

New York City, Madrid, and Beijing, are all at the same latitude. The average low in Beijing in January is -8.4 °C. For New York City it is -3.22. And finally, for Madrid it is 2.6. Why the difference? Barcelona, to the north and east of Madrid, on the coast, has a mean low of 4.4 °C. This tells us what’s going in the most general sense. Continentality. My friend’s ignorance was understandable; Beijing has a much more frigid clime than southern Europe. China as a whole is much further south than climate without context would suggest, while Europe is much further north than most expect. All that has to do with the rough shape of the continents (and possibly the Gulf Stream for Europe, though this might be overdone taking into account the generally mild character of western upper temperate regions of continents). But first, let’s look at another example.

Zack has started to improve on static R plots with Google powered charts. Check it out. Alas, I can’t inject script tags into the body of my posts, so that’s not feasible for me. Notice on Zack’s plot that I’m more East Asian than either of my parents. The tendency first cropped up with 23andMe’s ancestry painting, and I have seen it in my own ADMIXTURE runs, so I don’t dismiss it as V2 vs. V3 chip anymore. Though I’ve ordered an upgrade myself, so we’ll see for sure. Also, though both my parents are about the same East Asian, they exhibit a different balance of East Asian subcomponents. I’ve seen this in my own ADMIXTURE runs, and I’m going to check for more fine-grained matches with the HGDP East Asian populations soon to ascertain whether their eastern ancestral mix is different. Good times.

The Pith: In this post I review some findings of patterns of natural selection within the Drosophila fruit fly genome. I relate them to very similar findings, though in the opposite direction, in human genomics. Different forms of natural selection and their impact on the structure of the genome are also spotlighted on the course of the review. In particular how specific methods to detect adaptation on the genomic level may be biased by assumptions of classical evolutionary genetic models are explored. Finally, I try and place these details in the broader framework of how best to understand evolutionary process in the “big picture.”

A few days ago I titled a post “The evolution of man is no cartoon”. The reason I titled it such is that as the methods become more refined and our data sets more robust it seems that previously held models of how humans evolved, and evolution’s impact on our genomes, are being refined. Evolutionary genetics at its most elegantly spare can be reduced down to several general parameters. Drift, selection, migration, etc. Exogenous phenomena such as the flux in census size, or environmental variation, has a straightforward relationship to these parameters. But, to some extent the broadest truths are nearly trivial. Down to the brass tacks what are these general assertions telling us? We don’t know yet. We’re in a time of transitions, though not troubles.

Going back to cartoons, starting around 1970 there were a series of debates which hinged around the role of deterministic adaptive forces and random neutral ones in the domain of evolutionary process. You have probably heard terms like “adaptationist,” “ultra-Darwinian,” and “evolution by jerks” thrown around. All great fun, and certainly ripe “hooks” to draw the public in, but ultimately that phase in the scientific discourse seems to have been besides the point. A transient between the age of Theory when there was too little of the empirics, and now the age of Data, when there is too little theory. Biology is a very contingent discipline, and it may be that questions of the power of selection or the relevance of neutral forces will loom large or small dependent upon the particular tip of the tree of life to which the question is being addressed. Evolution may not be a unitary oracle, but rather a cacophony from which we have to construct a harmonious symphony for our own mental sanity. Nature is one, an the joints which we carve out of nature’s wholeness are for our own benefit.

The age of molecular evolution, ushered in by the work on allozymes in the 1960s, was just a preface to the age of genomics. If Stephen Jay Gould and Richard Dawkins were in their prime today I wonder if the complexities of the issues on hand would be too much even for their verbal fluency in terms of formulating a concise quip with which to skewer one’s intellectual antagonists. Complexity does not make fodder for honest quips and barbs. You’re just as liable to inflict a wound upon your own side through clumsiness of rhetoric in the thicket of the data, which fires in all directions.

A few weeks ago I started looking at the 23andMe raw files of some of my friends and integrating them into HGDP and HapMap population data sets. One of the first things I did is remove the African populations from my total data. The reasons is as you can see to the left, Africans occupy the largest principal component of variation, which sets them apart from Eurasians. Without this dimension of variation the non-Africans are squeezed into one dimension, and groups like Oceanians and Amerindians show up in the strangest places. But that’s because these groups are non-African, and do not differ as much along the primary west-east axis of genetic variance which shakes out out of any such analysis. Africans aren’t the only issue though. As I’ve noted before I’ve been running ADMIXTURE, and isolated groups such as the Kalash can “monopolize” one particular color. This may be due to the Kalash being some distilled essence of an ancestral population, but I suspect that it’s more genetic drift due to isolation which has made these sorts of groups distinctive. So I removed these outliers…though do note that other “outliers” often pop out of the data to take their place quite often.

Below is a slide show with the PCAs of the 1st component of variance plotted with the 2nd, 3rd, and 4th, components. At the 5th and beyond it seems that the lower eigenvectors achieve a level of stability in magnitude. Remember that the plots are not scaled. The 1st PC is about an order of magnitude bigger than the 2nd. I’ve also attached an ADMIXTURE plot with K = 12, both for populations, and the individuals who have given me their 23andMe files. I’ve placed them upon the PCA. And yes, ID001 and ID002, are my parents.

I was semi-offline for much of last week, so I only randomly heard from someone about the “Science paper” on which Molly Przeworski is an author. Finally having a chance to read it front to back it seems rather a complement to otherpapers, addressed to both man and beast. The major “value add” seems to be the extra juice they squeezed out of the data because they looked at the full genomes, instead of just genotypes. As I occasionally note the chips are marvels of technology, but the markers which they are geared to detect are tuned to the polymorphisms of Europeans.

Efforts to identify the genetic basis of human adaptations from polymorphism data have sought footprints of “classic selective sweeps” (in which a beneficial mutation arises and rapidly fixes in the population). Yet it remains unknown whether this form of natural selection was common in our evolution. We examined the evidence for classic sweeps in resequencing data from 179 human genomes. As expected under a recurrent-sweep model, we found that diversity levels decrease near exons and conserved noncoding regions. In contrast to expectation, however, the trough in diversity around human-specific amino acid substitutions is no more pronounced than around synonymous substitutions. Moreover, relative to the genome background, amino acid and putative regulatory sites are not significantly enriched in alleles that are highly differentiated between populations. These findings indicate that classic sweeps were not a dominant mode of human adaptation over the past ~250,000 years.

Over the past few months I’ve been encouraging people to pull down ADMIXTURE, and push the public data sets through it. Additionally, you can also convert your 23andMe raw file into pedigree format pretty easily and integrate it into the public data sets with PLINK. I’ve been following Zack’s Harappa Ancestry Project pretty closely, but I’ve been running the software myself and manipulating its parameters and seeing how things shake out. But the more and more I do it, the more I wonder if it isn’t like regression analysis, a technique which is just waiting to be leveraged by human biases. I began thinking of this more deeply after a conversation with a computational biologist who outlined the structural problems with how ad hoc the utilization of statistics is in the life sciences.

These sorts of qualms are probably why I’m posting my results more on Facebook and passing them around friends, rather than putting them out there in the public domain. It isn’t that I think the results are going to be abused. I just don’t know what they mean a lot of the time. Or, perhaps more honestly I am suspicious of my own propensity to see what I suspect. A case of my priors strongly shaping the inferences which I might generate.

So I decided to do an experiment. Below are 8 runs, displayed as bar plots. Each thin sliver represents an individual. The colors again represent putative ancestral populations of which the modern populations are combinations, generated by the parameter K (so K = 2 means two ancestral populations, each corresponding to a different color). There are two data sets which I analyzed, group A and group B. I’ve also noted the K’s for each plot. But aside from that, I’ll leave you ignorant what these populations are or how many there are. Jot down some ideas as to what you can see. How many populations? How do they relate to each other? Can you perceive any real information in the higher K’s? I’ll put the “answers” below the fold. There’s no point in me saying what I think, I already know which populations these are, so I’m tainted.

Now, we don’t want everyone working in genomics to start using the same blue-on-grey slide to illustrate the impending datapocalypse; so I’d encourage people to download the raw data (warning: Excel file) and make their own pretty pictures.

My straightforward attempts are below. Get the raw data and try your own. I assume that R and gnuplot could produce something prettier….

Manu Sporny reflects on one week of being in the public domain in terms of personal genomics. I already pulled down his data, as has Zack. The whole post is fascinating, but this is really interesting: “I found out that it’s illegal to send any of your genetic material outside of Russia to have it analyzed.” In a related vein, seen Dr. Daniel MacArthur’s When “Cautious” Means “Useless.” I know that 23andMe is a for-profit business in it to make money for its backers, but there are certainly huge social spillover effects among my set in its bringing 500,000 to 1 million markers to the masses. It’s a clear concrete case of how innovation can result in positive gains across society. I am not a knee-jerk libertarian, but your genetic data is your genetic data. Own it, analyze it, and claim it!

I’ve been rather busy this week, so few posts. But, I did a Bloggingheads.tv with Milford Wolpoff. We talk Out of Africa, Multiregionalism, and such. Second, The New York TimesprofiledSecular Right, where I’m a contributor. The quotes were accurate, though I do find it amusing that the reporter refers to me as an apostate, but not John Derbyshire (who until ~5 years ago was a confessing Christian). I suspect that in this day and age the term “apostate” only has strong valence in relation to Islam. For the record, several ex-Muslims have disputed my apostasy, since I barely ever believed in the Islamic religion.

I’m in a hurry right now, and won’t be posting much this week. But, I thought I’d dump some of the ADMIXTURE runs I have. This is one with 80,000 markers, and Eurasian populations, Papuans and Mozabites. I removed the New World and Africa to constrain the variance space. This time I’ve labelled the ancestral components, but do not take them totally literally. I think in the future I might just remove the Kalash to see what happens. This is K = 7. Not too busy, but I think enough K’s to separate out the various West Eurasian groups. Additionally I’ve put the genetic distances, Fst, below, and, visualized them on an MDS. Nothing too surprising.

Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Gene Expression

This blog is about evolution, genetics, genomics and their interstices. Please beware that comments are aggressively moderated. Uncivil or churlish comments will likely get you banned immediately, so make any contribution count!