Focusing on European population genetics and modern physical anthropology.

search this blog

Wednesday, March 15, 2017

Failure to replicate

Just in at bioRxiv:

We fail to replicate a genetic signal for sex bias in the steppe migration to central Europe after ~5,000 years proposed by Goldberg et al. PNAS 114(10):2657-2662. Estimation of X-chromosome steppe ancestry in the Bronze Age central European population with the qpAdm method (Haak et al. Nature 522, 207-11) does not indicate lower steppe ancestry on the X-chromosome than in the autosomes. We perform a simulation which indicates presence of estimation bias of -19.5% in the inference of X-chromosome admixture proportions using the method used by Goldberg et al., largely eliminating the observed sex bias.

Comparing the sex-specifically inherited X chromosome to the autosomes in ancient genetic samples, we (1) studied sex-specific admixture for two prehistoric migrations. For each migration, we used several admixture estimation procedures, including ADMIXTURE model-based clustering (2), comparing X-chromosomal and autosomal ancestry in contemporaneous Central Europeans, interpreting greater admixture from the migrating population on the autosomes as male-biased migration. For migration into late Neolithic/Bronze Age Central Europeans (BA) from the Pontic-Caspian steppe (SP), we inferred male-biased admixture at 5-14 males per migrating female. Lazaridis & Reich (3) contest this male-biased migration claim. For simulated individuals, they claim that ADMIXTURE provides biased X-chromosomal ancestry estimates. They argue that if the bias is taken into account, then X-chromosomal steppe ancestry is similar to our autosomal ancestry estimate, and that hence, steppe male and female contributions are similar. We conduct simulations of ancient and modern data under a range of conditions. We conclude that our inference of male-biased Pontic-Caspian steppe migration, seen using ADMIXTURE, STRUCTURE, mechanistic simulations, and X/autosomal FST, is robust. Our analysis further illuminates the impact of small haploid reference samples on ADMIXTURE; we look forward to refining sex-specific migration estimates as larger, higher-coverage ancient samples become available.

I knew this was the case as neolithic mtdna is so different than modern European mtdna. The Yamna and neolithic Siberian groups had a higher frequency of H which is why I knew this already, although I still think the Yamna were a dead end according to current evidence as some of them had C (which further proves my theory of steppe Ancestry but the Yamna as a dead end). I am not certain why some folks thought Neolithic populations of Central Europe (LBK ect) had a mtdna contribution to Europe when the mtdna was so different . The Hamangia who had a high frequency of H (unlike other neolithic groups) which may have had been dominated by R1b and despite having a culture connection to the near east did not come from there and may have spoken Indo European languages which could be as old as 10kya. According to scholars Celtic Languages are 2,600 years old however according to genetics it is probably at least 3,800 in the British isles alone. The scholars probably underestimated the ages of all Indo European languages. Maybe people have been using Wool for longer than scholars thought they were.

John Smith"According to scholars Celtic Languages are 2,600 years old however according to genetics it is probably at least 3,800 in the British isles alone."

Language is still not a genetic attribute, so according to genetics we can't say anything about the age of Celtic. Also Celtic had plenty of time to reach the British Isles later (and it probably did even later that its own age) and that not necessarily meant easily detectable population turnover. The assumption that two populations was culturally identical and spoke the same language subgroup just because they cluster close on a PCA assembled with a few (and incomplete) DNA is baseless. (I do not talk about IE in general, but telling apart the garden varieties of IE from aDNA.)

"Language is still not a genetic attribute, so according to genetics we can't say anything about the age of Celtic. "

I'm not a linguist but I would still bet money that genetic calculation is more in the scientific realm, where as linguistics is highly theoretical. I'd never bet my money on linguists who can't even agree on anything, let alone prove anything beyond a doubt.

Awood: Genetics is indeed much more of a hard science than linguistics, but that is beside the point. Genetics can be however scientific, languages are still outside of the core expertise of the field, since the variations of modern human languages are not genetically coded, but purely cultural phenomenons. Ancient DNA study can serve only as auxiliary information. It can be useful, but you can't seriously except it to answer questions like how old "Celtic" is. And yes, linguistic can not give a hard scientific answer to such a question either. there are things we will never know in the way as we know the DNA code in our mitochondria for example.

*Genetics* may be a hard science, but drawing linguistic conclusions from genetics is *not* genetics; it's sociolinguistics. "My data about population movements (or lack of them) comes from a hard science, therefore my speculation about the linguistic impact of these population movements (or lack of them) is hard science" - no.

Not read the paper yet, but as I commented at the time, using ADMIXTURE for this purpose seemed pretty questionable, and glad to see they've pulled out qpAdm for this purpose. With ADMIXTURE, different clusters might form on autosome vs X for reasons orthogonal to real ancestry. X-Fst vs autosome-Fst is similarly somewhat questionable due to confounds of general levels of within population diversity on each.

Not hugely surprising, and again I'm gonna call on researchers to test whether a bias exists in modern populations relative to a neutral outgroup.

The only finding looking solid at the moment to me is Chiang 2016 (http://biorxiv.org/content/early/2016/12/07/092148), that Sardinians have relatively more X chromosome EEF ancestry (Anatolian+Loschbour) compared to their autosome relative to Tuscans, Spanish, British and Finnish, who are all neutral to each other.

But this would mean that an alternative explanation will have to be found for dearth of y - G and I2 in recent Europeans. Founder effect? That is, admixture was not initially sex biased, but *eventually* steppe chief (or at any rate, at least "Eastern European early IE adopter") y-dna replaced everyone else in Europe to a large degree?

Beware, because it doesn't mean there wasn't a sex-bias in the Steppe migration, but that the margin of error is so huge that the variation is included within. So, it only means that the procedure used to detect it was not precise enough. I think it will be a huge matter of discussion for the times being, because it's a core subject for archeology. Who made it to Central Europe? Men only? Men with some women? Or Men and women?

This is almost a complete reverse of the way the technique was first introduced, using only outgroups to West Eurasia and then carefully adding perhaps some ancient West Eurasians.

Lack of informativity about the West Eurasian ancients in non-WEE outgroups was something I found questionable about the qpAdm technique from almost the beginning (tldr version; there seems like a good deal of dispersal that the different WHG, for instance, have on East Asian related stats and that overlaps with much of the range of present day Europeans, and so putting that in the driving seat seems like it will add noise to your conclusions).

From the perspective of whether this tells us anything new about the autosomal ancestry in the Bronze Age, here are their autosomal fits with the population labels for the samples :

http://i.imgur.com/RFCFh8p.png

Take it (too) literally, and ignore standard errors, and you have I0059 - LN2 as steppe mixing with a population 43% HG and 57% Anatolian_Farmer (which is practically a PWC or Blatterhohle cave sample) and at the other end of the spectrum RISE577 - Unetice would fit as steppe mixing with a population 31% less HG than Anatolian Farmers were (it is an outlier but other populations are there who are steppe plus AF and negative HG in substantial quantities).

(labels from preprint of the paper this rejects: http://www.pnas.org/content/suppl/2017/02/20/1616392114.DCSupplemental/pnas.1616392114.sapp.pdf)

These seem quite different from the fits back in Haak 2015, where the Halberstedt_LBA, Benzigerode_LN2 and Alberstedt_LN fit as mixtures of EEF+Yamnaya with no additional WHG, while Unetice picked up a good chunk of WHG. While here it's quite the other way around. Presumably this reflects much more informative qpAdm methodology with the populations they use in it?

This is great, science working as it should. Do worry about its impact on Goldberg though, she's such a young researcher and her methods papers are really good.

So this is why Reich is unhappy with her results.

It is definitely false that there was *no* sex bias, as the Q statistic, utilising basic fst in the Goldberg paper, tells us that there *must* be sex bias somehow. The issue is with methods that attempt to detect proportions, such as ADMIXTURE (in Goldberg) and qpAdm(in Reich). The Q statistic is not analytically transparent.

@ Davidski, if you don't mind, do you know what do the authors mean by "We ran qpAdm with allsnps: YES and Mota as the basis population", esp. by "basis population"?

@ Ryukendo_K: Re: Fst, I'm not too happy that only a subset of the matrix of FstA and FstX was provided by Goldberg's paper for (AF,CE,BA,SP,HG). E.g. - http://i.imgur.com/rdQeqvG.png

Annoyingly incomplete, as would've been good to see if various of the Q ratios which are not included (such as BA-CE, CE-SP, AF-SP, HG-SG) were even consistent with the 0.7-0.91 Q ratios which are present for all but BA-SP. We have less information about the range of the paired Q statistic in these ancient samples than would have been easy for them to compute and provide, for no apparent reason.

Although RK, on those autosomal Fst scores from Goldberg, one thing I found odd was that some of them seemed to depart from the reported Fsts from Lazaridis 2016 and Mathieson 2015: http://i.imgur.com/0YjwDXD.png, e.g. in terms of differentiation of CE Neolithic from Anatolia Neolithic, WHG from other populations.

... and that implies different Q ratios - http://i.imgur.com/3gokcGC.png

So I wonder how these differences arise between their method of calculating Fst. Might have to reread that bit of their paper more carefully.

@ Davidski, interesting stuff, I didn't know that anything like that mattered in qpAdm either.

In the traditional five-cast societies of ancient antiquity (ENE/BA/EIA) there used to be A dynasty at the helm and stem of it.

Which implies that the "royal seed" was to be distributed from the core of the biological and political society - through herritance and herritance only.

Which means that there were ONE and only one royal descendant, as in the present title "chrown-prince" alt. "arch-duke".

The younger brothers/half-brothers of the crown-prince would form the "Upper Nobility" - to perform as Dukes/Lords within the respective districts of the kingdom ('land'). Duely, they were to marry a number of women to produce the (next) local chieftains within their di-strict. Consequntly each and every chieftain (earl, marki) would be a son-son to the old king.

Within the local communities their respective Earls had to marry women on EVERY farm in his constituency - to produce the next generations of Farmers/Peasants/Ceorls, as grand son-sons of the Old King.

As each Ceorl (Husbond) came to maturity he would become the new Farmer - and the ONLY man to reproduce on his respective farm/yard/gard. Consequently EACH and EVRY child born - on each and evry farm in the Kingdom - where ALL to be grand grand-children of the Old King himself. Which paved the way to the old term "All-father" - as a sacred as well as a "heavenly" term.

The first 'royal lines' spreading after Ice-Age - starting some 12.000 years ago - would create "extended families" to become "aets" or "etnicities". Thus creating genuine, 'homegrown' dynasties - as a 'natural' (historical) consequences of space and time itself.

* One consequence were that some earls and peasants could marry up to 50 times - to produce the 50 children nessecary to uphold or extend the population-numbers of their jurisdiction.

* A second consequence is that every new generation was counted - and 'changed' only by the royal take-over, when the chrown-princes of the G, H, I, J and R1-dynasties, respectively, started a new generation (mutation) of their roy-al seeds.

* The ancient frog-leap-pattern of the y-dna seed-lines seems to be a consequence of the agnatic inheritance known from the Old Eurasian high-cutlures - and their respective dynasties.

# As the Holocene Optimum started - 8.000 years ago - the median temperature of Eurasias 55th parallel became about one degree Celsius warmer than today. All the way up to Finland, Onega and Oleni Ostrov - where the oldest highlanders (R1a) and their cold-blooded domesticates occured some 7.500 years ago. At the brink of the start of the Volga-riversystem and the (already) established trade-route to the Caspian Sea, the Aral and the Ural.

# Checking the cold-booded horses (Prewalski, Fjord, Steppe) we find them to overlap with the spread of the cattle-breeding high-landers - creating what's known as "Scandianvian agriculture" vs. the "Euroepan" - where the warm-bloded horses (taipan) congruates with the spread of the large cattles and the heavily lactose-persistent "lowlanders" (R1b). We find them as far north as the western Baltics, where - in fact - the oldest known burials of oxes and horses are found. As well as the worlds highest density of LP - by far.

# Looking at the spread of Pottery we find the Volga-Don area as pivotal. From a 9.000 year old pottery from Elshanka we find the 7.700 yrs old Sperrings/Narva-ceramcis in Carelia, Finland and Estonia, where it transcends into the advanced Asbestos-ceramics in northern Fenno-Scandia and the well-known Pit-Comb-ware along the Volga, as well as the eastern Baltics. In paralell we find the oldest cermics in NW Europe in the Western Baltics, where the first known EBK is dated to 8.000 BP.

# The early comb-ceramics were paralleled by the Ertebolle/FB-ceramics, which transcended into the stroked, corded, belled and cordial Beaker-ceramics of Brittain and western Europe.

Obviously there's a co-existence between the western Beakers and the spread of warm-blooded domesticates and their herding "lowlanders" of the R1b-dynasty. Which, obviously, brached off into the eastern lowlands via the Vistulan transport-zone, along Dniester-Bug and Dniepr-Don (taurian, tyssagetae) - south to Anatolia and east to Bactria, bordering the southern farmers of the y-dna G-dynasty.

# The success of the cattle- and corn-breeding cultures took off as the Holocene warm-period reached its Optimum, and the former tundra and taiga of the Younger Dryas became covered with lush grasslands - all the way up to north-western Norway, southern Finland, western Russia, the Caspian steppes, the Tarim bassin, Mongolia and China. It may seem that an eastern branch of semi-nomadic herders/farmers were branching off as y-dna Q.

# Duely, we have no need to explain the extensive multiplication and migrations of cattle- and horse-breeders as "conquerors" - as the grassland-areas they fertilized and grew were outside/beside the areas needed by the the fishers and gatherers - and their herds of goats, sheep and/or reindeer. Which explains the whereabouts of the I2/I1-dynasties, as well as their eastern completaries of y-dna N and O.

# Except from the latter two, we may suspect that ALL of the mentioned 'dynasties' and consquent etnicities were familiar with the proto-IE tongue. That would explain why and how A I-E stem could spread from Ireland and Spain to India and Tarim, already before the spread of 'heavy' farming and cattle-breeding.

I'm not hostile. I think it is great. Isn't that what I said? I just went back and checked. I said "great". So, just to clarify, in the last people couldn't publish papers like this easily, and it is great that some experts are cleaning up after less careful academics. It is great.

^^ Yes but it's likely that CWC didn't mix with local at first, and they were simply steppe/ ANF. Admixture with various local MNE & WHG occurred later, as shown in the Unetice and post-Corded Baltic BA groups

However, the mobile pastoralist economy has probably roots in central east Europe ( Baden, GAC ) as much as the steppe. The American analogy is needless and wrong, because the level of difference between colonials and Amerindians was marked (literally civilizations apart). This is very different to the situation in copper age Europe, where interaction occurred for 2000 years before a 'snap valve" event occurred c. 3000 BC, resulting in a large westward thrust, even if eastern migration also occurred.

"This is a question for everyone: Can X chromosomes identify sex bias admixture?"

Looks like they can. Jeong et al's results about sex biased admixture in Sardinians relative to mainland Europeans (for example in figure 7 http://biorxiv.org/content/early/2016/12/07/092148) look solid. They're based on D-stats, not supervised ADMIXTURE/Structure like Goldberg et al. and the modern and ancient samples used are of higher quality than the steppe samples.

The issue appears to be that an ADMIXTURE-based approach produces questionable results. For example, are BedouinB *really* pure Neolithic-Bronze Age Levantines and at least some Druze pure Bronze Age Armenians like this supervised ADMIXTURE test from Marshall et al. suggests? http://www.nature.com/articles/srep35837/figures/5

Davidski: "Many of these estimates look plain wrong."There is quite a robust tendency with these outgroups for the populations here with low steppe to have a high HG:Anatolian Farmer ratio, and then populations with high steppe to have a low HG:Anatolian Farmer ratio - http://i.imgur.com/Jkvpzyc.png

Looking at the numbers for means (assuming individuals have some noise), could be just about consistent with a scenario where Corded Ware is a steppe group mixed with one of those low HG:AF Neolithic farmer groups (Lengyel+Hungarian Copper Age?) who are ~10:90 HG:AF and Unetice+other LNBA takes on more admixture from populations (in Germany / Baltic?) which are richer in HG:AF (~35:65)... But that still seems kind of an odd scenario.

Despite what I wrote upthread, maybe Lazaridis and Reich should be using a richer set of outgroups with ENA / Villabruna cluster related groups in as well (the latter notably lacking), after all. I'd have hoped they'd know either way though.

Maybe Lazaridis and Reich know something that we don't, like, for instance that there was a narrow band of pure Anatolian-like or Lengyel-like farmers still alive between the steppe and Germany, and this is where the early Yamnaya-like Corded Ware were getting their farmer admix from, as opposed to from the HG-rich typical Middle Neolithic German farmers and South Baltic foragers.

I think at least one of the problems might be having ancestral groups like AG3-MA1 and Villabruna in left pops and the derived Eastern_HG in the right pops. But I'm not sure.

I need to think carefully again about the outgroups that might be useful for different populations when using the new qpAdm. The strategy in this preprint seems to be too simple and cautious, but your models are too complex and risky.

I don't take seriously anything that Genetiker does unless it's backed up by someone more competent, so if you're referring to that R1a from Dnieper Donets, it has now been confirmed by the guy from YFull.

I'm jumping into the Y-HG {X-Chr; mt etc} debate late but it may not be a simple model of dominance vs migration. During times of stress females produce more female offspring which makes evolutionary sense. See: Preconception stress and the secondary sex ratio: a prospective cohort study, Chason et al, 2012.

Genetiker has Corded Ware as 70%-80% Gravettian on his admixture chart lol I can't take this guy seriously, some of his blog posts are very absurd and strange.

I thought he would change after the whole apology post he made not too long ago but nope.

"These results show that many of my beliefs about European genetic history were wrong. I thought that the Aurignacians belonged to Y haplogroup I, but the one Aurignacian sample was C1a2. I thought that the Gravettians belonged to R1, but four Gravettian samples were C1, I, IJ*, and C1a2. I thought that the Magdalenians were R1b, but two Magdalenian samples were I. These results also imply that my beliefs about Indo-European origins were wrong. I apologize for attacking others over their positions on these subjects."

so, regarding Genetiker."so if you're referring to that R1a from Dnieper Donets, it has now been confirmed by the guy from YFull"

the guy finds it... but them someone else that should have done the job in the first place "confirms"... and the original finder is crazy? - Ok

These results show that many of my beliefs about European genetic history were wrong. I thought that the Aurignacians belonged to Y haplogroup I, but the one Aurignacian sample was C1a2. I thought that the Gravettians belonged to R1, but four Gravettian samples were C1, I, IJ*, and C1a2. I thought that the Magdalenians were R1b, but two Magdalenian samples were I. These results also imply that my beliefs about Indo-European origins were wrong. I apologize for attacking others over their positions on these subjects."

- The guy admits errors and apologizes for his mistakes....- Ok, completly wako, I see.

Pretty skeptical of the models in this paper and I want to see whether adding any of the other ancient populations they didn't use will drive a new result, or not (at least whether adding Villabruna if there's no time for any of the others).

Science largely does progress by small incremental changes to the major view. Of course there are occasional major revelations, which are of much more interest than the day to day work that goes almost unnoticed by the general public.

"Genetiker has Corded Ware as 70%-80% Gravettian on his admixture chart lol I can't take this guy seriously, some of his blog posts are very absurd and strange.

I thought he would change after the whole apology post he made not too long ago but nope.

"These results show that many of my beliefs about European genetic history were wrong. I thought that the Aurignacians belonged to Y haplogroup I, but the one Aurignacian sample was C1a2. I thought that the Gravettians belonged to R1, but four Gravettian samples were C1, I, IJ*, and C1a2. I thought that the Magdalenians were R1b, but two Magdalenian samples were I. These results also imply that my beliefs about Indo-European origins were wrong. I apologize for attacking others over their positions on these subjects.""

What this implies that the entire edifice built on the hypo of TWO sepearate refugias - during the LGM - as a basis for ancient dna-analyzis is wrong. Quite simply.

Which means there's (still) NO evidence for a Gravettian refugia separated from a Aurignacian/Magdalenian refugia during "the LGM" - to explain the Mesolithic/Neolithic dna of Eurasia.

In fact there's ample evidence that the pre-LGM populations of Eurasia had genetic inter-change with eachother.

Moreover it's plain and clear that both mt-dna U/T and y-dna C-F survived the lean bottle-neck of the LGM in various locations across the MILDER part of northern Eurasia - i.e. the Atlancitc facade.

Finally the glacial/postglacial distribution of ARCHEOLOGICAL sites have proven that the final and hardest bottleneck happened during the Younger Dryas (YD). Thus we can sigth a change of the dominant haplogroups before and after LGM, as well as before and after the YD.

The ancestors to the present Eurasians were all based on the population(s) that survived the last, cataclysmic cold-snap, when 2/3 of the larger land-animals of Europe and arctic Asia died out.

Which means that the paleolithic mammut-hunters of the Franco-Celtibrerian "Magdaleniens" and the East-European/Black Sea "Gravettians" were NOT points of causation, to explain the variety and distribution of y- and mt-dna in todays Eurasia. In facts we DO know that they were genetically as well as culturally connected. As weer the last mamoths...

http://www.dandebat.dk/images/1514p.jpg

At least the sinature "Genetiker" have the guts to admit that he's been misled - wether by facts, confusion or persuation. Others shouiting even higher about the same mistake obviously don't share the same civil courage. Eventhough they've promptly been explaining R1b to be caused by an "Celto-Ibrerian refugia" and R1a to be "a result" of a "Cento-Carpatian" or "Trans-Caucasian" refugia. Eventhough there's still no signs or evidence that ANY of them indeed did PERSIST through the cataclysmic climate of the Older and the Younger Dryas.

To find an area of PROVEN persistance (substinance) throughout the Younger Dryas we have to look for the land were the sun sets - and the warm waves from the Caribean tropics kept punching Eurasias west-coast with surface-water and summer-breezes above 12* Celsius.

@Davidski You have some neat scientific abilities, but your conclusory assumptions are shameful. Your knowledge of history is also quite weak. "The Bronze Age expansion was like Europeans in America?" That's a conclusion masquerading as an argument. You're better than that.

There are at least 10 different ways that Population B can have more descendants than Population A. Here are a few:

1. Population A simply starts out with fewer members.

2. Population A dies of diseases.

3. Population B has a CULTURAL attitude about having more babies, whereas Population A does not.

(Examples:

Mormons versus Episcopalians in Nevada,

Hispanics versus Whites in California,

Palestinians versus Israelis in Israel)

If you knew anything at all, you would know that at various times in history, it was THE ELITES who did NOT HAVE KIDS.

Example: the Roman patricians had so few kids, they often had to ADOPT to have an heir. The plebs had tons of kids.

Example: modern Israel. The Palestinians have far more kids than Israelis.

This is not dominance, Davidski, it's demography.

Here is Davidski in the year 3000 A.D. "Clearly the Palestinians had an advantage that enabled them to dominate Gaza. They were the elites." Sorry, no.

4. Example four: massive immigration into Population A's homeland because Population B's original homeland was economically depressed and couldn't support a large population.

Hello, America 2017? Do you read the news at all? Have you heard about Trump and his wall proposal?

5. Example five: Population B is driven out of their homeland because of a superior force waging war on them.

The Goths fled the Huns in the east, only to make wars on the populations in the west. The picked on became the bully.

A subset of this is a genuine refugee crisis. Hello, Syria?

So, Davidski, there are just five examples how your "elite" theory is more than likely totally bogus.

@huijbregts @Davidski @FrankN @Matt @Alberto and @everyoneRegards.Could you take a look at my last post here?http://eurogenes.blogspot.com.es/2016/12/early-indo-european-migrations-map.html?commentPage=2Thanks

In that paper there is a diagram with an Armenian R1b-P312. It can create a false impression that it is an early branch of P312 in Armenia. This further can lead to speculations that L51 is from West Asia.

It is wrong!That P312 is the famous DF27 cluster in Khndzoresk (https://en.wikipedia.org/wiki/Khndzoresk#Genetics )The age of that cluster is 850 ybp. And all members of that cluster live in the same village.In general P312 is present at low level (<1%) in many West Asian countries. This can be the legacy of Romans, Galatians, Crusaders. The latter is more probable in this specific case. So NO there is no P312* in Armenia, and no L51* also. Even the Albanian L51* is not confirmed.

In reality nothing has changed after the Myres et al. The highest level of R1b-L11* is in British islands. There is no evidence in the that R1b-L51 formed or moved to Europe from West Asia.

With those extra outgroups and removing AG3, the fits look consistent and convergent for Halberstedt_LBA and Corded Ware in the ratio of HG:AF.

For Halberstedt_LBA itself the fits are essentially unchanged, while what happens for the Corded_Ware samples is mostly a shift that takes some of their steppe ancestry and a smaller proportion of AF and increases HG.

So the main difference in the offsets seems like this set could be better at distinguishing a steppe+minority AF combination from HG. I guess this is what Villabruna and Iran_N / CHG stats drive, and stats around Levant_N + Upper Paleolithic Siberia + other UP Europe don't quite do it.

All a bit richer in HG:AF ratio than MN usually seems to be, so might be worth testing Iberia_Chal, Germany_MN, Hungary_CA, Remedello with this setup and see what they come out with. HG:AF in these is 38:62, but then again that's only around 30:70 if we allow for Hungary_HG being 15% AF as per recent Lipson paper, so no big deal.

All this said, shouldn't really affect their main findings for the preprint unless its non-consistent on the X chromosome and there's a disproportionate change there.

"So, Davidski, there are just five examples how your "elite" theory is more than likely totally bogus."

None of your examples seem to explain the 'star-like' phylogeny of the Y-chromosomes of the migrants in the Bronze Age, while the mtDNA did not experience this phenomenon.

Davidski and others are saying that there was an elite culture AMONG the incoming population, but that the migrant population as a whole had a (seperate from elitism) reproduction advantage over the locals, based on their culture and technology.

The 'star-like' phylogeny is hardly anything but a reflection of the ancient reproduction-system, built on agnatic, five-stepped dynasties - forming 'extended families' ("aets") and effectively kingdoms and etnicities.

Using the present, post-christian monogamy as a standard model for the various stats and runs just won't do - neither deterministically nor stockastically.

From the facts known from the old civilizations the dominating reproduction-system was pyramidical and polygamous - rather than flat and monogamous.

@Algan mardiYou report that the results of your Global10/nMonte runs were very sensitive; weighting the model made them more coherent.Your suggestion is that the weighting has improved the quality of the estimations. That is not necessarily true.I don't doubt that penalizing the higher dimensions makes the results less sensitive, but the question is: what did you filter away, noise or relevant signal?I invite you to have a closer look at the highest dimension of the Global10. At the negative end of dimension 10 you will find 22 African samples; at the positive end of dimension 10 you will find 3 African samples, followed by a lot of EN samples..This definitely doesn't resemble noise. So by penalizing the higher dimensions of the Global10 you are excluding information from the calculation of the Euclidean distance. I don't think this is a sound practice.

@huijbregtsFirst, thanks for the answer, second, congratulations about nMonte, its fantastic!I don´t understand how it works plenty. AFAIK, the convergence works around (colMeans(matAdmix), I agree with you that penalizing the higher dimensions of the Global10 we are excluding information, but the point is if the PC´s of PCA are scaled or not, if not, maybe weighted them the euclidean distance betwen vectors reflects better the real distance betwen pops. I don´t know, you are the master, i just learn. Great work.

I think it's all debated there already. For some reason, the output from Global 10 seems to upscale the higher dimensions. I can only speculate that this might be for the purpose of being able to make plots that are visually meaningful (because if it gave the mathematically correct values, and you tried to plot PC1 vs. PC8, it will look almost like a straight line due to the much higher variance in PC1, which is not visually informative).

So weighting with the sqrt of the eigenvalues seems to get the PCA back to their mathematically correct values. Which doesn't have much of an advantage in most normal cases, but in a few ones it's clearly superior (since, after all, it's the correct ones that give correct euclidean distances between world populations, unlike the original, unweighted ones).

I've been running both side by side for a while. Most of the time one shouldn't worry about it, but overall I lean towards weighted values being better (because they avoid those bad cases, and overall seem to be the correct ones). In any case, that's not something that should affect nMonte. It should affect the input data (that is, you should weight the values of the datasheet with the correct weights posted on that thread too, not the ones first posted on anthrogenica).

Re: "star-like structures", when we talk about "star-like structures" , in the words of S Yan 2014 (the paper about Chinese Neolithic "super grandfathers"), we're really just talking about "multiple lineages branching off from a single node", in opposition to a history of "bifurcations" which "indicat(es) strong expansion events". No more or less than that.

To some degree I think you do see in all lineages - are any of the survivors today are rooted in a simple history of bifurcations over time since the early Holocene? Kivisild 2017 to me seems the best map for this at the moment, at least in West Eurasia and North America with quite different timings (the R1 clades the latest). Older stuff that lacks much coverage at depth outside Western and Northern Europe might be weaker at seeing when more and less "star-like" phases happened for groups outside R1.

(Though on the note the structure of pre-Indo European populations one thing I would say is that by Kivisild, the ydna I2a I-L621 clade that shows high peaks in South East Europe today and is essentially the I2 survivor at high frequency seems to have begun population expansion at 7,200 KYA, around 2,500 KYA before the R1 subclade expansions.

OTOH Sardinian I2a M26 has expansion around 10,000 KYA, at the same time as Sardinian G2a L166.

So not all of what we might assume to be pre-Yamnaya y-dna groups expanded at exactly the same time.)

@ Alberto & Algan, yeah, Alberto has said it all.

When we've talked about "weighting" PCA, this has kind of confused matters, because all we're really talking about is the pros of using an unscaled PCA vs an eigenvector scaled PCA . That is an unscaled PCA with each dimension * square root of the dimension eigenvector = eigenvector scaled PCA. They're exactly the same thing - when you tell a PCA software / algorithm to do eigenvector scaling, that's all it does.

(I believe some like huijbregts were dubious of the idea of using "weighting" in part because the impression and discussion came off that we looking at adding further "weighting" to an already eigenvector scaled PCA or in a way that was different from applying eigenvector scaling.

I think also others inc. Alberto realised the correct eigenvector scaling method much faster than I did, as I was trying out some other scaling factors, which were wrong, but after comparing output from PAST3 in eigenvector scaled vs eigenvector unscaled mode for PCA and Principal Coordinates Analysis, it's evident that the above is the correct way to do eigenvector scaling).

As Alberto says, there was a finding I think during the experiments that funnily enough the unscaled data actually still picks up expected closest relatives in euclidean distances and, with nMonte, most likely ancestors. So the unscaled PCA are not necessarily even giving very wrong conclusions. I would say this is probably because even without proper scaling, the relationships are just evident in the structure of the dimensions, whatever the weight.

But I personally agree with generally using eigenvector scaled PCA whenever possible, and whenever you look at using a PCA for methods like nMonte based on calculating distances, tree building, etc., it is best to establish first whether the PCA has been eigenvector scaled or not.

@KarlK, OK, though would be interested in your comment on Karmin 2015's Cumulative Bayesian skyline plots of Y chromosome and mtDNA diversity by world regions (Fig 2 - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381518/figure/KARMINGR186684F2/).

In neither case does population history of expansion of mtdna match y-dna population expansion (no shared expansion). Expansion of the population with Neolithic and later revolutions sees expansion in y-dna, after an initial regress (earlier in Near East, later in Europe), but seems to be no change in mtdna effective population size. Seems from that, is *no* "star-like" expansion of mtdna ever, not even in the early Neolithic due to generalised population growth.

The difference in y vs mt means that there were never any major (successful) population movements of modern humans with both men and women that were 'explosive' in population growth. Most, like the out-of-Africa or Sahul situations, were slow steady growth in a new environment.

@Karl: But my point was that the Neolithic everywhere generally shows a pattern of explosive growth in males and steady state in females. Not just Bronze Age Europe. That is "The Bronze Age was (not) different (from the Neolithic generally)". Or am I reading these plots incorrectly?

@Matt, AlbertoAs far as I know eigenvector scaling implies that you simultaneously change the eigenvectors and eigenvalues in an appropriate way. That will conserve the value of the resulting PCA scores. The weighting method of Sangarius/Eren is weighting the PCA SCORES before using them in nMonte. This is a peculiar weighting method, which is different from eigenvector scaling. I suspect the mathematics of it very much. But in many cases the results might be not too far from the unweighted results, because 'Sangarius' specifically affects the higher dimensions.Now it is possible that Alberto has found a correct scaling method, that I could not imagine. Unfortunately he refuses to share the script with me, so I don't know. But I doubt it.

The time resolution of mtDNA is 20-30 times coarser than that of Y-DNA, so I don't think you could actually tell how fast the expansion is. For instance M has something like 50 primary branches, which is probably equivalent to the branching levels of F-GHIJK-HIJK-IJK-K-K2-MPS-P, C-C1-C1b-C1b1-C1b1b, etc, over several millennia.

The currently available dataset does not contradict the hypothesis that R-GG400 marks a link between the EastEuropean steppe dwellers and West Asians, though the route and [b]***even ***direction of this migration is disputable[/b]. xyyman comment – they know the truth but would not say.

And do you know what Depigmented actually means....? Most Europeans do have pigment just not as much as Africans. If they didn't have any they wouldn't be able to tan. Albinism is a totally different thing altogether.....

Instead of burying heads in the sand and being in La LA land. There is no mistaking what the evidence has shown us so far. Undeniably Western Europeans are depigmented Africans. Stop deluding yourselves. WHG like LA Brana and Loschbour carried ancestral alleles for skin pigmentation. In other words…. they were BLACK. La Brana AIM has high frequency in the Sahara populations. . It is a FACT. Furthermore. aDNA from the Canary Islands Showed that R1b-M269 was present in the Sahara migrants long BEFORE the appearance of the “Spaniards”. Give it up and let’s talk science and stop deluding yourselves. R1b-M269 was in Ireland, what, 2000BC? Significance?! Stop it people!!

Instead of these turnip brain keep talking in circles. Face the facts. O. Balanovsky 2017 is stating he/she not even sure that of the “direction” of migration. No one came from the Steppes! It is a lie started by Reich and perpetuated by those who wanted to ride and cash-in the Steppes coat fantasy. O. Balanovsky 2017 – is stating the even the direction is disputable for …yes…. R-GG400! He obviously saw something in his date that he did not disclose until he was sure. Can’t some of you read and understand?

"The time resolution of mtDNA is 20-30 times coarser than that of Y-DNA, so I don't think you could actually tell how fast the expansion is."

Yes, but if the same thing occured with the mtDNA as with Y, the mtDNA would apoear to be 20-30 times more identical in populations. This is not observated. Look at Ireland. The men have R1b at a frequency over over 80%. Yet there is not a single mtDNA haplogroup with more than 15% distribution (even though it is 20-30 times less mutatable).

"Are the depigmented Northeast Asians also from Africa less than 10 000 years ago ?" answer: no!

Anyone with half a brain knows there were essentially TWO migration OOA. The first led to East Asians/Native Americans/Andamans/Onge etc. The second led Modern Europeans and other Eurasians as far as the Harrapan Valley. Sergi is being proven correct.

East Asians are OLDER than Europeans and from a DIFFERENT stock OOA. That is why there "depigmentation" genes are different. Rickles, Norton, etc et al.

Europeans are as much as 80% African at K2. Rosenberg et al. Southern Europeans are as much as 80% "Neolithics" per Lazaridis et al. Will go guys get off the Steppes nonsense. The Steppe AIM is a sub set of the Hunter Gatherer AIM. ie a fraction with increase distance from Africa. It is called Isolation by Distance. This is not to hard to understand. There is no such thing as Steppe Ancestry. It is make believe. lol!

No Salden! you are not talking to an Afro-Centric. You are talking someone who has far batter analytical ability than you could ever dream of. FYI. I have no clue who is Yakub and I have a small idea of what is a Moor. Typical stupid…people. Can’t prove me wrong on a scientific level. Throw their hands in the air and stump out of the room crying foul. You are out of your league. You are talking to someone that is above your pay grade. Face it! Dyck!

To those who don’t get it. There is no such thing as Eurasian/Non-African Admixture! There never was and never will be. @ K2 “non-African”/Eurasian is found in Africans from the Cape of South Africa to the tip of Morocco or Tunisia and the Suez. “Non-African” SNP is ALSO of African origin that is why it is found all throughout Africa. It is called Isolation by Distance or Genetic Surfing. Understand that! No, modern Europeans are depigmented Africans.

@ Ric and Salden . You know I just thought about this. We know for a FACT that Villabruna had a tall, lean tropical body, black skin and most likely blue eyes and R1b. Did these incoming lighter skinned female Neolithic women from Africa have a preference for tall, black, and handsome hunter gatherer men? You know the phrase. Thoughts? You know where I am going with this don’t you?