February 06, 2010

X-chromosome variation in global populations

The frappe analysis for K=7 using ~16k and ~19k X chromosome (top) and Chromosome 16 (bottom) SNPs is shown. The pattern is almost identical.

This showcases the fallacy of a common objection to the concept of "race", namely that it is "trait-specific" and by looking at one trait (or locus) we will arrive at one racial classification, while looking at another wew will arrive at another.

The fact that by looking at two completely independently inherited pieces of DNA we arrive at the same conclusion is strong visual evidence that race is neither (a) a subjective property which depends on which part of the genome we look at, nor (b) a holistic property that can only be inferred by looking at the individual in toto.

BackgroundThe transmission pattern of the human X chromosome reduces its population size relative to the autosomes, subjects it to disproportionate influence by female demography, and leaves X-linked mutations exposed to selection in males. As a result, the analysis of X-linked genomic variation can provide insights into the influence of demography and selection on the human genome. Here we characterize the genomic variation represented by 16,297 X-linked SNPs genotyped in the CEPH human genome diversity project samples.

ResultsWe found that X chromosomes tend to be more differentiated between human populations than autosomes with several notable exceptions. Comparisons between genetically distant populations also showed an excess of X-linked SNPs with large allele frequency differences. Combining information about these SNPs with results from tests designed to detect selective sweeps, we identified two regions that were clear outliers from the rest of the X chromosome for haplotype structure and allele frequency distribution. We were also able to more precisely define the geographical extent of some previously described X-linked selective sweeps.

ConclusionsThe relationship between male and female demographic histories is likely to be complex as evidence supporting different conclusions can be found in the same dataset. Although demography may have contributed to the excess of SNPs with large allele frequency differences observed on the X chromosome, we believe that selection is at least partially responsible. Finally, our results reveal the geographical complexities of selective sweeps on the X chromosome and argue for the use of diverse populations in studies of selection.

68 comments:

Very interesting, Dienekes. Another common argument I hear against the concept of "race" is that humans are already a race of subspecies, homo sapiens sapiens, so how can we be any further divided if we are already a subspecies? Not that I am convinced, but this one is getting around...

Dienekes: you are cheating to justify your ideological choice (because using or not the concept of race is ideological, not objective).

Table 1 clearly outlines what variance exists within populations (c. 90%) and what variance exists between populations (c. 10%) or world regions (9-11%). I'm using the autosomal data here because it should best depict the overall variance, but the X-linked variance is not too different.

So talking of race or subspecies is talking about only 10% of human variance, while 90% of the diversity exists within populations.

For you individual ABCDEFGHIJ is racially different from individual ABCDEFGHIK but not different from ZYXWVUTSRJ, where the boldfaced letter represents that 10% of inter-population variance and the other letters represent the 90% of intra-population variance (i.e. globally shared on random grounds).

This is in fact the main argument not against race but against the obsession with and misinterpretation of regional ("racial") differences: that what we share is immensely more than what makes us different on mere regional ("racial") grounds.

Here's the problem with this paper and with the use of the word "race":

From what I can make out, the populations in the study are all geographically stable populations in which there has been little racial admixture in the recent past. The most visibly mixed sample I can see is the Hazara. Their history is well known and documented.

The "frappe" analysis does not indicate variance for each of the different groups. You can certainly eyeball the data and see differences in sample, but you don't have an objective measure of the variability.

So let's take woman X who has ancestors that have lived in North America for a while. They've "gotten around" so to speak. Her dad is of documented Scottish descent. Her mother is of documented Scottish descent. Surprise! If she goes for genetic analysis, they will tell her that she is Scottish. BUT! Her mother's father and father's mother are not of Scottish descent. In fact, her father's mother is an American new Englander who's family have been having kids with ??! for the last three hundred years. And her mother's father is English from southern England.

So who is she? Is she Scottish?

Fast forward to a situation I heard about on the radio recently. It does not bode well for creating public trust in the 23andme business model.

African American woman, with AA features and parents with AA features goes in the get her DNA "done". Surprise. Hotshot young DNA councillor tells her that since her mtDNA is not African and since her father's YDNA is not African, she is entirely white.

Imagine the confusion that this women will have. The fact that she will think that her father is not her father is only the beginning.

Race has a very clear meaning when people don't move around. It is easier to characterize a person's genetic race when their ancestors haven't been moving around in the recent past.

Unfortunately for someone who wants an easy classification system, almost anyone who has had ancestors who have lived in North America for an extended time is "mixed race," genetically speaking.

Precision, accuracy and a very careful use of approximation are extremely important in talking about a person's genetic background.

In Eurasia the situation is even more ridiculously non-racial: the difference between regions (3) is 1.32%! This means that the difference between Europeans and East Asians is extremely low. But tell anyone who believes in the concept of race that genetics demonstrate that there is no "racial" difference between Mongoloids and Caucasoids.

Another interesting case is Africa, though for different reasons: most of the inter-population diversity (4.4% on average) is concentrated among hunter-gatherers (7%), while only 0.9% belongs to agriculturalists. However it's worth noticing that all them are West African or Bantu and that the more interesting and probably more diverse East Africans have not been sampled. I presume that Mozabites were considered part of the Middle East.

"Another interesting case is Africa, though for different reasons: most of the inter-population diversity (4.4% on average) is concentrated among hunter-gatherers (7%), while only 0.9% belongs to agriculturalists. However it's worth noticing that all them are West African or Bantu and that the more interesting and probably more diverse East Africans have not been sampled. I presume that Mozabites were considered part of the Middle East."

Thanks Maju, for your insightful comments.

That will be interesting, getting a better sampling of Africans. There are some very distinct regions, even within West Africa.

Let's now ponder the focus of the paper, which is the structure at X-DNA level. First thing I notice is that the 19K graph certainly seems to be better defined than the 16K one - not too much but it's neither such a huge leap of resolution either. The more noticeable change I have spotted is among Sardinians and Italians, who, from being 50-50 European/Middle Easterner with 16K, become 70-30 with 19K.

Another thing I notice is that sample size and depth do matter. While Africans and Oceanians have more than 10% X-linked variance among populations, the structure graph at K=7 is unable to detect those difference, while instead making a host of groups among Eurasians, who have like 3% of inter-population variance. Beware of such artifacts!

West-South Eurasian clustering is interesting because it's very poorly defined. We can spot three components, that can be described as European (purple), West Asian (blue) and South Asian (green) but nearly all populations have all the three in different apportions. This suggests me that these three populations never really finished diverging in any clear manner. We can of course spot these components only because oversampling of this macro-region, otherwise we should see a single homogeneous cluster, I think, the same that happens with other regions.

That will be interesting, getting a better sampling of Africans. There are some very distinct regions, even within West Africa.

The problem is that all these studies tend to use the same public database (Rossenberg's) because it's cheap and easy to access. For whatever reasons neither East Africa nor India nor Australian Aborigines were sampled and the overall sample is skewed in numbers to West Eurasia (and secondarily East Asia and Pakistan). Someone should bother widening the sample but who will? Meanwhile they could at least run the K-means algorithm with balanced populations - but nope. :(

I just noticed something else. In the additional file 1, figure C, they run the algorithm with real chromosomes instead of creating "pseudofemales" (by coupling two male X chromosomes of the same population) and that causes a big difference in the result at K=7: the "European" purple component almost disappears and West Eurasians become more homogeneous. There is still a purple component but it's wildly distributed among most Eurasian populations (along the cyan "Papuan" one at lower values), making the actual number of clusters at K=7 to be in fact only 6.

Not sure what to think but I'd say that the formation of "pseudofemales" sounds to "cheating on Nature" and may artificially increase the degree of homogeneity of each population and hence difference between regions. Dunno.

I would leave it to the experts. They have not mentioned race or racial divisions in humans just ethnic divisions which exist and geographically separated humans. No need to froth off out of the mouth.

Humans are a recent species. They are not Crocodiles or Sharks with pedigrees going back many many millions of years. Humans have a lot of genetic similarity to each other, and probably to Neanderthals if they were still around. The lack of major genetic differences does not mean one cannot distinguish individual humans from another or groups of humans from other groups. That is all the study is doing, finding how large groups of humans differ. No need to lose your head over it because East Asians can be distinguished from most ethnic groups in Africa. Or that certain conserved sections of the X chromosome shows differences in what is conserved depending on geographical separation of humans.

If I had an identical twin brother, I would still want whatever minor differences between us, magnified and noted. I am who I am and I want to distinguishable from every other human even my twin brother. These studies are just finding the differences, minor they may be to you, but to those people, the descendants of long residence humans in different geographical zones, they want to be different. No Chinese person wants to be mistaken for an Anatolian Turk or even a Japanese on the most minor SNP tests. Humans want to be different just as boys and girls love being their gender and don't wish to change.

Forget the race fixation and just enjoy the study as knowledge about our minor differences. The X chromosome has been begging to be properly studied for quite a while.

marnie, most of what you wrote is not real. No test can distinguish Scots from English or Irish or many other Europeans in general. Our particular array of SNPs don't follow ethnic lines or at least such minor genetic distinctions between Scots and the English. The cultural and conceptual differences may be major but genetically, it is infinitesimal.

An Africa America women, is one of those cultural distinctions Americans use. Genetically African Americans are a hodge podge, showing genetic origins from three continents, and geographically separated humans. They are a new people. No one assigns cultural labels such as African American based on silly things like haplogroups especially for a mixed geographically originated group as that.

I'm particularly intrigued in your comments about undersampling or no sampling of certain populations. You recently made a similar statement about SE Asian populations. I guess it is a huge undertaking to actually do DNA sampling on all these populations. It does create a bias in this research.

I'm occupied with other things for the next few days, but I would like to have a closer look at this paper and what you are saying. (Amongst other things, I'm working on an essay about the Refus Global. Something tells me you would find it amusing: http://www.thecanadianencyclopedia.com/index.cfm?PgNm=TCE&Params=A1ARTA0009671)

I'm particularly intrigued in your comments about undersampling or no sampling of certain populations. You recently made a similar statement about SE Asian populations. I guess it is a huge undertaking to actually do DNA sampling on all these populations. It does create a bias in this research.

Not sure which was that comment of mine you mention right now but when you are not going to just enumerate the data but to make statistical inferences through algorithms, proper sample seems to matter a lot, introducing curious and strange distortions. This has been confirmed in some ad hoc papers but anyhow is something that seems quite obvious, very specially for methods such as Principal Component analysis and K-Means analysis, very common in the literature.

The oversampled populations are always going to show up their internal differences easily, while those undersampled will typically cluster with others, even if they are more different. This is the case of the San for instance in this and other similar studies.

I agree that proper sampling may be hard work, even economically costly, but adds value to a research.

"So talking of race or subspecies is talking about only 10% of human variance, while 90% of the diversity exists within populations".

But that's probably true for any group of subspecies we care to look at. Chimpanzee subspecies have far more in common with each other than they have difference.

"We can spot three components, that can be described as European (purple), West Asian (blue) and South Asian (green) but nearly all populations have all the three in different apportions. This suggests me that these three populations never really finished diverging in any clear manner".

Isn't it just as likely to show that the three populations contain varying admixture from the three basically separate populations rahter than being the product of a single diverging population?

"The problem comes at the boundaries, where racial labelling becomes less accurate".

Because of mixing at those boundaries. In any species spread over a wide region subspecies form over that region. However it is often very difficult to decide where one subspecies stops and another starts. Taxonomists finish up claiming such subdivisions as superspecies, subgenus etc.

" In any species spread over a wide region subspecies form over that region. However it is often very difficult to decide where one subspecies stops and another starts."

It is fine to make groupings of populations for the purpose of research.

The problem comes in when you are talking to an actually person who is trying to find out what their genetic prehistory is. If there is an obvious difference between a genetic technique, such as mtDNA and YDNA testing alone, which undersamples the genome, and the self perceived race of a person, it is appropriate to talk to the person about the limitations of the test.

If there is an obvious difference between THE RESULTS FROM a genetic technique, such as mtDNA and YDNA testing alone, which undersamples the genome, and the self perceived race of a person, it is appropriate to talk to the person about the limitations of the test.

Terry: that's what I mean: that subspecies (or races) are an arbitrary concept that has little (if any) scientific validity. I don't say that there is not some clustering (homogenization and/or founder effects) on regional or population grounds: there's always something of that but it's ridiculously insignificant.

Isn't it just as likely to show that the three populations contain varying admixture from the three basically separate populations rahter than being the product of a single diverging population?.

It could be. But can you explain when did South Asians migrated to Europe in any meaningful numbers other than in the depths of early Upper Paleolithic? So if Europeans show some significant apportion of South Asian alleles, even after running the cluster-making program this must be a remnant of a still unfinished differentiation. The opposite may hence be true as well: South Asians carry European alleles because (maybe) these alleles existed among them before divergence and have remained as minority component.

In this last case one could also think of Indoeuropean invasions but is not necessary considering that the opposite is also true and has no historical explanation after the MP-UP transition.

Table 1 clearly outlines what variance exists within populations (c. 90%) and what variance exists between populations (c. 10%) or world regions (9-11%). I'm using the autosomal data here because it should best depict the overall variance, but the X-linked variance is not too different.

That is an old, tired, and ignorant argument. It makes the dubious assumption that there is some magic "fraction" of variance below which taxonomic differentiation does not make sense.

The 10% (and even much lower, say between Greeks and Austrians) is more than enough to achieve near-perfect accuracy in assigning individuals to populations.

Nor is it a valid argument that because e.g., Europeans and Sub-Saharan Africans (or even more so Greeks and Austrians) share most of their genome then the differences are not of functional significance:

Humans share most of their genome with chimps, and yet there are huge difference between the two species _in the context of primates_. Similarly, Europeans and Africans share most of their genomes, and still there are huge differences between the two races (many of them visible) _in the context of humans_.

As an African-American with over 50% European ancestry, I find these arguments for the biological construct of race to be extraordinarily jejune. Clearly to decribe individuals, many of whom are quite admixed, one need to use continuum measures such as positions on PC plots, rather the categories. Race is socially constructed and varies in defintion across societies. I have relatives who are "phenotypically Caucasian", while I am "phenotypically Black", yet we share the same proportion of European admixture

The thing is that if you put all of humanity in a PC plot, you will -of course- get a continuum (there will be _some_ individuals in every part of the PC space), but at the same time most individuals will be located in big blobs together with other similar individuals.

A good analogy to think about is "castles in the sand". If you build a few castles in the sand next to each other there is clearly continuity between them, but they are also distinct.

As time passes by, air and water will move sand grains from a castle to the beach and from beach to castle. Eventually the landscape will be flattened.

With respect to humanity we are in a situation where the "castles" are still visible and distinct.

So global inter-regional differences are in the most extreme cases of c. 10% of this variance, what makes 0.01% of the genome and in Eurasia, where inter-regional diversity is very low, of c. 0.0013% of the whole genome.

Instead most of the variation happens within populations: you and some other random Greek can be different in c. 90% of the variance or, what is the same, c. 0.09% of all the genome. You can perfectly be genetically closer to a random Nigerian or Vietnamese, than to a random Greek at your very same street.

There's only some slight increase in the chance that a random Greek is closer to you in genetic terms than a Papuan or Pygmy. So there is very little (if any) advantage in supporting your "race" (racism) in terms of personal genetic success.

Can you still draw differences between populations? Sure, if you focus on that small fraction that is different, but emphasizing that is a subjective, ideological, choice.

The argument may be "old" (how old?) but is nevertheless critically important, as important as that "old" invention of the wheel, for instance.

Polak: races are "old myths". I am not repeating anything just explaining what is real about that myth and what is unreal (most), based mostly on this paper (curiously enough).

It's absolutely not true that a random European might be closer to a Nigerian or a Vietnamese than to another European when the entire genome is taken into account.

What you say is what is false, Polak. You are just measuring inter-population or inter-regional variance which is only 10% of the whole variance. It's in this paper, I'm not making up anything. All the data is in table 1.

All you need to do to realize this is take a high density genome wide test, using 500,000 to 1,000,00 SNPs.

At 23 and me, right? Those guys are scammers.

Read some scientific literature and, please, ignore what traders claim to sell their products because it's likely to be a mere lie.

Of course, you are also free to believe whatever you wish, be it God, UFOs races or whatever. But if you're going to argument it, please do it in a scientific manner.

Maju, I'm referring to the raw data at 23andMe, which comes from illumina. Also, the tools at deCODEme using the same raw data tell the same story.

If you take a half a million data points from across the genome, and then divide the genome into segments, in order to access its overall character, it's absolutely impossible for a random European to be closer to a Sub-Saharan African or East or South Asian than to another European. Testing variance within and between populations doesn't change this fact.

Of course, that is a very strong argument for the existance of races, but honestly I couldn't care less whether people believe in that or not. I just don't want to hear nonsense like you're posting here when the facts speak for themselves. Anyone with half a brain can work it all out once they learn to use their raw data from 23andMe or deCODEme.

How come the variance inside populations, at individual level, does not change that fact? Once and again, any process that only aims to look for geographically structured variance will ignore that variance that has no geographic structure whatsoever. Pretending otherwise is being naive... or worse.

Maju I'm just telling you how it is. Testing within and between population diversity is one thing, but comparing individuals from different gene pools is another.

You can see what I'm talking about in a comparison of random individuals when their genomes are divided into 1cM (centi-morgan) segments. East Asians who aren't directly related to each other are as close as some Europeans who are cousins. However, all Europeans and West Eurasians are closer to each other (white area) than to East Asians (purple area). At the end come the West Eurasian - East Asian - Africxan pairwise comparisons (aqua blue area).

http://www.box.net/shared/d7z0sblu06

So feel free to debate the race issue all you like, but you can't argue with the above facts. It's basic maths and biology, and easy enough for a child to understand.

Polak, I don't really gather what the colors may mean: after sorting by name1 and chk_sum, the first of the list are said to be siblings but are aqua blue.

I am not even sure what data exactly they are measuring either.

Whatever the case, I don't say that the odds favor that you are closer to any random Nigerian because that 10% of regional affinity makes it unlikely even if the other 90% would be perfectly distributed randomly through all Earth, what I say is that for sure you'll be more closely related genetically (even if not genealogically) to some Nigerians than to some Poles. It's a matter of mere chance: 10% of inter-regional differences alone can't distort the whole 100% picture so much.

It's like playing cards with 10 cards and knowing that all Europeans have an ace and all Africans a king instead... but the rest of the hand is just random. So you can perfectly end with a hand that is much more similar to some African person than to some Polish person.

Maju, the higher within population genetic diversity you're talking about is always structured in such a way as to make it impossible for a European to be closer in overall genetic structure to a Sub-Saharan African or East Asian than to another European. This is what the above spreadsheet clearly shows.

The fact that you don't understand this, or are unwilling to make steps to understand it, is not important. What is important is that it's a fact that is at odds with the false claims you've made here.

So please don't make false claims. If your argument against human races is so strong, there really should be no need for you to distort the truth.

That same table also shows the percentages at various levels of continental and subcontinental comparison and they are all in those levels or lower. So the truth seems to be that about 90% of the variability is among individuals at any scales (and to me it makes total sense because you can perfectly find individuals in Africa that look like individuals in Europe, even if one is black and the other white, which obviously are traits described by that 10% of the code variation that goes between world regions).

All the rest is mere wishful thinking by people who, obviously, have an ideology and an agenda.

"If there is an obvious difference between THE RESULTS FROM a genetic technique, such as mtDNA and YDNA testing alone"

If there has been mixture between originally geographically separated populations the mt and Y DNA might tell you nothing about such mixture, especially not the proportions of each. In the case of America it's perfectly possible for someone who looks totally African to have European haplogroups, and vice versa.

"that's what I mean: that subspecies (or races) are an arbitrary concept that has little (if any) scientific validity".

I don't think many taxonomists would agree with you on that. I agree that geographic boundaries tend to be more porous for the human species (largely because of our progressive technological developments) but often the same geographic boundaries serve to separate subspecies other than human subspecies. Even in the case of many birds, which one would assume can fly across them.

"But can you explain when did South Asians migrated to Europe in any meaningful numbers other than in the depths of early Upper Paleolithic?"

The green colour seems less associated with 'South Asia' in particular, than with the Iranian Plateau, from Baluchistan and the Pakistan hill country north into Afghanistan. Even the Brahui may have originally come from further north. So the connection with Europe becomes less difficult to explain.

Terry: it'd be interesting to compare the levels of genetic identity and diversity among populations and regions with other species. Maybe we could find some kind of threshold that would allow us to talk of subspecies in some cases and less so in others with some accuracy and not just best hunches.

The green colour seems less associated with 'South Asia' in particular, than with the Iranian Plateau, from Baluchistan and the Pakistan hill country north into Afghanistan...

All those peoples were sampled in Pakistan. Even if some cross into Iran and Afghanistan, they were not researched there, so I'm very much justified in associating that component with Pakistan specifically.

Maju, please show me evidence that a European individual has come out closer to a Sub-Saharan African or East Asian than to another European in terms of overall genome structure using high-density genome wide data.

It's not something I've ever seen at 23andMe, deCODEme, in any private comparisons using raw data from illumina, or in fact in any scientific reports.

The reason I haven't seen it is because it simply doesn't exist. So if you can show me an actual example, I'd be amazed. Thanks in advance.

Maju commits Lewontin's Fallacy (PDF). The fact that most genetic variation is common to all populations does not mean that the variation is randomly distributed. Rather, it is geographically structured. It is impossible for, say, a Pole to be genetically more similar to a Nigerian than to another Pole, and vice versa.

Thus the answer to the question “How often is a pair of individuals from one population genetically more dissimilar than two individuals chosen from two different populations?” depends on the number of polymorphisms used to define that dissimilarity and the populations being compared... However, if genetic similarity is measured over many thousands of loci, the answer becomes “never” when individuals are sampled from geographically separated populations.

Maju, you can't prove the claims you're making here because they're absolutely false.

I know they're false, because the data I have shows it as plain as daylight.

Now, you might think it's OK to repeat these false claims because they fit your ideology. But what is your ideology worth if it's based on pseudo-scientific nonsense. I'm sure it does make some people feel better to think that random Europeans can be more similar to random Africans rather than other Europeans. But this clearly isn't the case, so why make such claims?

Dienekes: we are not talking of the whole genome, of which only c. 0.1% is variable among humans, but the fraction of this already tiny 0.1%.

So what? This is already millions of variable sites. We already know that variation at even a single site can have extreme functional significance, and when you have a million degrees of freedom in a system you can create pretty darned different people.

Your "argument" also fails to take into account the fact that we are interested in human differences. The non-variable portion of the genome includes parts that make us multicellular, give us a nervous system, genital organs, breasts, two eyes, and so on that we share with various other living organisms. It is in the specifically human variation that we are to see the existence (or not) of races.

If we're interested in whether cars can be grouped into categories like "sedan" "SUV", "limo" etc. we don't look at the things they have in common (doors, 4 wheels, steering wheel, etc.) but in what they don't.

This conversational seesaw over "racial" categories is exhausting. Of far more interest to me and, I dare say, to most people is not how different is a Yoruba from a Han Chinese, but how different is an Akan from an Ewe or a Tuscan from a Greek. The microlevel of genetic differentiation tells a story, a story of migration and/or bottlenecks, potentially of prehistoric and (for Dienekes) historic scope. The Out-of-Africa story is a rerun.

Sure, in the end, if you draw enough cards, the odds will eventually materialize as "law" and make the likelihood of greater similarity across well defined cluster cores (i.e. ignoring clines) be practically null. But it's still an odd, so "never" is not a correct term but a simplification (i.e. 99.99% is not the same as 100%, which only exists at an impossible and abstract limit).

Another important clue is to realize that clines do not only exist but that inter-cluster (i.e. "inter-race") clines include most of the population at global scale. See for instance Sarre & Paabo 2004.

... the fact that we are interested in human differences...

Maybe you are. But I am also interested in human similitudes (of course: I'm a humanist). And also in an adequate comparison with non-human intra-species similitudes and differences, particularly in order to assess the validity of the concept of subspecies (races) in humans.

Anyhow, this phrase of you illustrates very well the likely fallacy of clustering strategies that tend to ignore the many similitudes across the populations and regions. Would we do things "properly", K-means diagrams should show not just the small fraction of the genetic variance that has regional structure (10%) but also the large fraction of it that does not (90%). This would result in less easy to read diagrams for people only interested in regionally structured differences, as you are, but would show also how the variance scatters across regions, which in itself is interesting.

Why is it interesting? Because of two reasons: (1) because it illustrates better the real differences and similitudes, better allowing people to visually assess it, and (2) because it'd allow us to track more precisely those fractions of the genome that have not been swept to fixation at regional or population levels. For example, in Coops'09, fig.4 you can see that the pigmentation-related alleles that have swept to near fixation in Eurasians, are also found as minority alleles in Africa. The same happens with other "less popular" Eurasian alleles. The data provided in this paper is not detailed enough but potentially this could allow us to trace the origins of such alleles in Africa and to reach some conclusions about human prehistory.

Because it's Prehistory and not making an apology of the ideological concept of race what I'm truly interested in. What about you?

If we're interested in whether cars can be grouped into categories like "sedan" "SUV", "limo" etc. we don't look at the things they have in common (doors, 4 wheels, steering wheel, etc.) but in what they don't.

Well, again that's your position, not mine. When I am engaged in conversations on cars, I almost always say the same: "they have four wheels, don't they?" No joking here, I have done that since I was a kid: I never shared that popular taste for the tiny commercial differences between cars and I still think that a lot of work is wasted in emphasizing them, in this case for capitalist economic reasons (and I say "capitalist economic" because true economy should be oriented to minimize work and ecological impact while maximizing the practical benefits for the people - but this is another story). Similarly you and others waste a lot of effort in emphasizing only those tiny differences between populations and that way try to hide the much more essential similitude among us and that, as happens with cars, we all have about the same potential.

"This conversational seesaw over "racial" categories is exhausting. Of far more interest to me and, I dare say, to most people is not how different is a Yoruba from a Han Chinese, but how different is an Akan from an Ewe or a Tuscan from a Greek. The microlevel of genetic differentiation tells a story . . ."

Royking2, thank you for the above statement.

Polak:

"Anyone with half a brain can work it all out once they learn to use their raw data from 23andMe or deCODEme."

Unfortunately, that does not appear to be happening right now. There is a perception developing in the general public that the whole story is in the mt and y DNA.

I sense trouble.

Most people do not have enough background to understand or post process their raw 23andMe data. They can't even read their own mortgage agreement.

The weekly announcements from various entities about famous person's mtDNA and yDNA, only stir the pot of confusion.

If you are going to sell someone an analysis of their genetics, you owe it to them to do the most complete analysis and presentation possible.

They shouldn't have to post process raw data in order to get an accurate picture of their genetic race or ethnicity.

And yes, their race or ethnicity should be determined by all their ancestors, not just their matrilinear and patrilinear line. Stepping back in time, that method yields a vanishingly tiny fraction of their total genetic picture.

"If we're interested in whether cars can be grouped into categories like 'sedan' 'SUV', 'limo' etc. we don't look at the things they have in common (doors, 4 wheels, steering wheel, etc.) but in what they don't".

Sums it up nicely.

I'll get back to Maju re., 'it'd be interesting to compare the levels of genetic identity and diversity among populations and regions with other species'. All i have at the moment is some information about ducks and about cattle.

"A good analogy to think about is "castles in the sand". If you build a few castles in the sand next to each other there is clearly continuity between them, but they are also distinct."

"As time passes by, air and water will move sand grains from a castle to the beach and from beach to castle. Eventually the landscape will be flattened."

Some sandcastles have multiple piles of sand. And the slopes and sizes of different sand piles vary.

I hope we don't end up with a bunch of flattened out sand. How boring. It would be nice, instead, if some sand piles were saved from the incoming tide, while at the same time, some new sand piles were built from the old.

"And does this selection occur mainly when the chromosome is single (males) or double (females)?"

"And many genes on that X chromosome originate much earlier than do modern haplogroups. I strongly suspect that each of our chromosomes have sections differing in their variability, and their age."

Terry, do you have any thoughts on this, and if so, could you speak on this.

In a related topic, your "duck" paper suggests that dabbling ducks are under a particular kind of selection, because they move around so much. I couldn't make out the details, as they were quite technical. Any thoughts?

Quote, 'Now, if we go back to these hypothetical million people from 500 years ago and imagine their one million hypothetical descendants today it is easy to see that the various genes present in the original population will not be present in the modern population in the same proportions. There will be more of some; less of others, some will have been totally eliminated and new mutations may have appeared'.

Quote, 'If some new characteristic in an inbred population is advantageous to the species as a whole the genes will be able to spread into any incoming population by selection in the hybrid zone. A hybrid zone of just one fertile hybrid individual is sufficient to transfer genes of course. In some cases the incoming group may be inbred as well. The spread of the characteristic could then be helped by restoration of heterosis in hybrids between the inbred population and the incoming group'.

Considering any species as one single population we see that genes are emerging and becoming extinct all the time. So the genes needn't all arise at the one time. Each individual gene has its own history, as Karafet suggested as much as ten years ago.

Not interested in sand castles or castles in the sky. Lets be specific to the discussion.

I don't think 23andMe are flim flam artists. I don't think much of them but they are sincere in what they try to do. They just failed for me. Then 23andMe are predominated by mixed Northern European Americans and Ashkenazim Jews. Southern Europeans like me, they offer nothing, and the Americans treat 23andMe as their private club which unfortunately reduces the intellectual content to moronic levels.

I, personally, have found no insights from my SNPs except from looking up the results of the raw data, and reading other sources of information like Promethease. Polako said something about the closeness of East Asians to each other. On 23andMe, East Asians, Chinese and Japanese, score high genetic similarly to each other, over 71%. Sub Saharan Africans likewise score over 71% with each other. This is in unrelated individuals. Parent and child share on average 80%. Unrelated Europeans score less genetic similarly, with Northern Europeans having more similarity to each other than Southern Europeans do with each other.

That seems a contradiction to the Africans are so diverse scenario when they share a high % with other unrelated Africans. Same with East Asians. Europeans are more genetically diverse to each other.

That seems a contradiction to the Africans [Sub Saharan Africans] are so diverse scenario when they share a high % with other unrelated Africans. Same with East Asians. Europeans are more genetically diverse to each other.

This should have to do with the homogenizing effects of the relatively recent true Negroid (Bantu?) and true Mongoloid expansions. A similar homogenizing expansion probably occurred in Europe (and also in West Asia and North Africa) with the Neolithic, but either its effects were more limited than those of these two expansions, or the expanding population(s) was/were genetically more diverse and varied than theirs. It could also be a mixture of these two factors.

Interestingly, these questions may have a connection to when and where the true Caucasoids first appeared and how they expanded throughout the different world regions.

I'm don't know much about the internal politics of 23andMe or any of the other companies offering a similar service.

My comments above are quite specific as to how the presentation of genetic information could be improved to lessen the confusion in the general public.

I also feel that the state-of-the-art in genetics often does not yet allow us to know the things we are most interested in.

Race is a loaded word. It can be quite divisive and is hard to define. Where do you draw the line? At a national boundary, at a continent? When?

At the same time, I am not saying that race and even nuanced genetic difference between say, the Scots and the English, does not exist. It does. Check out Dienekes link for "Greek Autosomal DNA." Look at the Eigenvector 1 vs Eigenvector 2 graphs for Europeans. You can see ever so slightly separated distributions.

It would be odd to think of a world in which everybody looked and thought exactly the same.

Thanks for your comments on the closeness or not in genetic difference between various groups. To be honest, I think it is difficult at this point to know anything conclusively.

That's the problem. We need to come up with some neutral term, but any term we come up with would rapidly be misused in the same way. Subspecies is a possibility, it means basically the same thing. But the 'sub' bit, to many, imediately suggests 'inferior', so we're back to square one.

"these questions may have a connection to when and where the true Caucasoids first appeared and how they expanded throughout the different world regions".

There were Caucasoids nearly two million years ago, if you're referring to ancestral humans who lived near the Caucasus Mountains. What's more those first Caucasoids may have expanded as far as Flores.

Terry: Caucasoid (unlike Caucasian or Caucasic) means "similar to the people of the Caucasus", the same that Mongolid means "similar to Mongols", Negroid "similar to Negroes" (old term for Black People), etc. It's a purely anthropometric term.

IMO it's a process that began with the colonization of West Eurasia some 50 Kya. It is not a terribly homogeneous category but more like the "standard" morphotype of that West Eurasian population, which has certain homogeneity, nothing else. In fact, by extension it also applies to all or most South Asians (who are not that different), hence we could consider it has a South Asian origin after all.

Marnie: the term "Caucasian" as used in USA "racial categories" does not just lack of any anthropometrical precise meaning (excludes South Asians arbitrarily) but it's also a term that lacks cultural universality: it's nothing but parochial slang.

The term "Caucasoid" is at least better defined and has some pretense of scientific precision. I'm not much in favor of using racial categories but in anthropology sometimes becomes too convenient to avoid their use. But, if we're going to use them, let at least be precise.

i like the research for the coclusion drawn in the article. (that 'race is both subjective when applied to looks, and empiric when comparing different alleles) yet i have a huge problem with the prevalence of dna research and their ilk, that is so specific and specialist i can't check any of it. personally i am quite convinced such kind of information will allways be subject to great bias by the original researchers. i know this might be offtopic. but it is the one thing that keeps me bussy around the whole concepts of genome sequencing. perhaps the point is mostly moot in the context of archeology, however it has strong reverberations in social science, most pronouncedly in forensics. as such i think it is important to bring it under your attention. for example except the uncommon and great ease of planting false evidence, for the interpretation of any such evidence we don't have a comparison outside asking the next everso familiar institute that is part of the exact same complex of industry's to check on their collegues, a difference however with similar terrains of research is that noone on the whole world except a few specialised institutes can check on them(1). i see so much room for abuse in all to many familiar shapes i can't but desperately want to bring this under your attention.as an example (and i must say in my lust for science one that i have taken quite seriously) i would point at the research that concluded that sarah " the mythical incubator for the mtdna of the jewish) has been interpreted as teh most recent mutation in in mtDNA) compelling research, but with it's jewish authors as it might be as reliable as the official story about AIDS.

wich is a fine example whereas if such research could be easily checked upon or replicated, the property's of the aids virus to be a recombinant result of several recognised cattle viruses (sheep vishnu virus, a bovine i forgot what virus and 1 more) would be common knowledge among people like you(us).

(1) wich is as far as i am aware not as much teh case for any other known analytical method applied, eg. chemical analyses (that is far less complex and intricate usually), or eyewitness (that can be conflicted even outside the offical channels)

"Marnie: the term "Caucasian" as used in USA "racial categories" does not just lack of any anthropometrical precise meaning (excludes South Asians arbitrarily) but it's also a term that lacks cultural universality: it's nothing but parochial slang."

Well, you might be interested to know that I never answer the racial questions on various forms that are put in front of me. Actually, my only choice is usually to check the "white" box or opt out of the questions.

Maybe it is just my scientist/engineering self, but there is something defacing about the word "white."

With regard to racial categorization, at least here in California, all people of dominant European and Middle Eastern decent are lumped into the category "white." Caucasian is not used on forms.

"The term "Caucasoid" is at least better defined and has some pretense of scientific precision. I'm not much in favor of using racial categories but in anthropology sometimes becomes too convenient to avoid their use. But, if we're going to use them, let at least be precise."

To be honest, I am not clear on the difference between Caucasian and Caucasoid. I think people are somewhat aware that Caucasian is implied to mean "looking like" someone from the Caucasus. For the most part, I believe it stems from a linguistic categorization which is now somewhat outdated. It was thought that most Indo-European languages originated in the Caucasus.

Genetics has clearly surpassed linguistics, in terms of its ability to group peoples. Unfortunately, neither society or bureaucracy has caught up. The geneticists also seem not entirely sure.

"It is not a terribly homogeneous category but more like the 'standard' morphotype of that West Eurasian population, which has certain homogeneity, nothing else".

'West Eurasian' sounds like a better term than 'Caucasoid' then. Except many people now living in Western Eurasia have recent ancestors from outside that region. 'Indigenous West Eurasian', perhaps?

"i am quite convinced such kind of information will allways be subject to great bias by the original researchers".

Everyone brings their own prejudices and preconceptions to how they view the world. Those are largely a product of our upbringing. It's impossible to escape them, but they can alter with time as evidence accumulates.

"Considering any species as one single population we see that genes are emerging and becoming extinct all the time. So the genes needn't all arise at the one time. Each individual gene has its own history, as Karafet suggested as much as ten years ago."

Thanks Terry. I had a little time to think more about the idea you are illustrating here. "Genes are emerging and becoming extinct" all the time.

This morning, I came across an interesting article, to this affect, regarding diet and cultural evolution:

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.