You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact us.

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

North Ukrainian R1a-M417 sample I6561 (dated to approximately 3,960 BC) gives a different picture of R1a-M417 Corded Ware, as Corded Ware seems to have directly descended from its community, without the need for any additional Steppe DNA insertion from later Yamnaya or EEF DNA insertion from Neolithic North European communities.

It perhaps indicates that Corded Ware populations were tightly-knit - largely genetically uninfluenced not only by Yamnaya on the paternal side (unsurprisingly, as they had virtually wholly different yDNA), but also by EEF Neolithic communities on the maternal side. It looks like Corded Ware was both of uniform paternal lineage and largely endogamous - its best fit contribution from the EEF groups that it replaced (Funnel Beaker, GA and Baalberge etc.) looks to be about 2% on average, so I cannot see that its men took in too many outsider women as it expanded.

One possibly interesting aspect is that North Eastern Corded Ware (Latvia and Lithuania) is a little different in that it does appear to have a Yamnayan element - a 95% best-fit contribution in one Latvian sample, and a 17% average contribution in Lithuanian samples generally. It is perhaps striking that these North Eastern Corded Ware (Yamnayan-admixed) R1a populations were the only ones that survived and thrived in Europe after Corded Ware's collapse.

North Ukrainian R1a-M417 sample I6561 (dated to approximately 3,960 BC) gives a different picture of R1a-M417 Corded Ware, as Corded Ware seems to have directly descended from its community, without the need for any additional Steppe DNA insertion from later Yamnaya or EEF DNA insertion from Neolithic North European communities.

It perhaps indicates that Corded Ware populations were tightly-knit - largely genetically uninfluenced not only by Yamnaya on the paternal side (unsurprisingly, as they had virtually wholly different yDNA), but also by EEF Neolithic communities on the maternal side. It looks like Corded Ware was both of uniform paternal lineage and largely endogamous - its best fit contribution from the EEF groups that it replaced (Funnel Beaker, GA and Baalberge etc.) looks to be about 2% on average, so I cannot see that its men took in too many outsider women as it expanded.

One possibly interesting aspect is that North Eastern Corded Ware (Latvia and Lithuania) is a little different in that it does appear to have a Yamnayan element - a 95% best-fit contribution in one Latvian sample, and a 17% average contribution in Lithuanian samples generally. It is perhaps striking that these North Eastern Corded Ware (Yamnayan-admixed) R1a populations were the only ones that survived and thrived in Europe after Corded Ware's collapse.

What program did you use? What dataset? And could you post the results? With standard errors and such?

What program did you use? What dataset? And could you post the results? With standard errors and such?

I used the most extensive dataset I could find (Genetiker's), and selected whichever combination of prior-dated samples yielded the lowest percentage autosomal variance from the average mean of the population in question. There were thousands of results for each test, and I've only retained the ones that yielded best fits.

For instance, the orthodox hypothesis (that German Corded Ware = 75% Russian Yamnaya + 25% Funnelbeaker) yields a variance that is almost ten times that of the best fit combination, so it is passed over.

Here is another example result - for a German Bell Beaker best-fit: 68% Bulgaria Steppe-like Chalcolithic + 14% Khvalynsk R1a + 18% Globular Amphora. This is quite similar to German Corded Ware, except that its core EEF:EHG ratio is a bit larger and this distinction is accentuated by a best-fit admixture with a Globular Amphora population.

My reading of this is that the ancestors of German Corded Ware look most like a branched-off Eastern Ukraine Suvorovo, and the ancestors of German Bell Beaker look like Prut-Dniester Suvorovo who moved North West towards a GA population before Corded Ware got there. Corded Ware people look self-contained and largely endogamous; the ancestors of Bell Beaker people look to have mixed more with other EEF populations like GA.

Genetiker's dataset does not include RRBP, but an mtDNA best-fit for Bell Beaker suggests a degree of RRBP admixture. Perhaps advancing Corded Ware forced pre-Bell Beaker westwards across Southern Poland, Central Germany and into Northern France, from where it later resurged (as Bell Beaker) to challenge it?

and selected whichever combination of prior-dated samples yielded the lowest percentage autosomal variance from the average mean of the population in question. There were thousands of results for each test, and I've only retained the ones that yielded best fits.

Using what methods and tools? How did you calculate this?

Originally Posted by Pip

For instance, the orthodox hypothesis (that German Corded Ware = 75% Russian Yamnaya + 25% Funnelbeaker) yields a variance that is almost ten times that of the best fit combination, so it is passed over.

Where can I find that dataset? Using what methods and tools? How did you calculate this?

The dataset is under the heading K=14 admixture analysis, and is in graphical form, so is a little tricky to use. I wrote my own tool on Excel that calculates the percentage autosomal equivalence between the samples under investigation and different combinations of prior-dated samples. (Identifying data to specific samples, thereby allowing them to be dated, can also be quite tricky.)

The dataset is under the heading K=14 admixture analysis, and is in graphical form, so is a little tricky to use. I wrote my own tool on Excel that calculates the percentage autosomal equivalence between the samples under investigation and different combinations of prior-dated samples. (Identifying data to specific samples, thereby allowing them to be dated, can also be quite tricky.)

I have also run some calculations for early Steppe DNA appearances in Southern Europe (ATP3 in Northern Spain 3,300 BC and Croatian Vucedol 2,800 BC). These look related to each other, but not directly related to Bell Beaker, Corded Ware or Yamnaya.

My reading of this is that the origin looks most like East Balkan Suvorovo that branched off early up the Danube, with ATP3 venturing as far as Spain and pre-Vucedol staying in the Northern Balkans and mixing with Cucuteni.

I have also run some calculations for early Steppe DNA appearances in Southern Europe (ATP3 in Northern Spain 3,300 BC and Croatian Vucedol 2,800 BC). These look related to each other, but not directly related to Bell Beaker, Corded Ware or Yamnaya.

My reading of this is that the origin looks most like East Balkan Suvorovo that branched off early up the Danube, with ATP3 venturing as far as Spain and pre-Vucedol staying in the Northern Balkans and mixing with Cucuteni.

Why for Genetiker in K = 16 on ATP 3, the " Steppe " component is the Teal/CHG? Why CHG would overrun EHG for the Steppe ancestry?

Why for Genetiker in K = 16 on ATP 3, the " Steppe " component is the Teal/CHG? Why CHG would overrun EHG for the Steppe ancestry?

Not sure. I can't even find the K=16 data for ATP3 - only that in Genetiker's K=16, the teal seems to be identified as "Northern Middle Eastern". In recent years, Genetiker has worked from a K=14 plot, which has the most extensive dataset, so this is what I have used.

What markod says. Also, ADMIXTURE is a quite rough approach to genetic mixture, especially if you mix older populations with modern day populations. Also, to fully appreciate what it states you need to weigh in other K values as well. ADMIXTURE goes a bit like this: When *forced* to be modeled as combination of two of the provided samples how would they look (That is K=2), and when *forced* to be modeled as a combination of three samples (K=3), etc etc. Afterwards the statistically best fit is chosen, if run in an unsupervised mode. Even in the unsupervised case a lot of individual samples will simply be a forced bad fit.

The tools that Reich labs created (f3stats, D-stats and ADMIXTOOLS) are available but rather complicated to use. Also, you'd probably need the full samples which take up a huge amount of disk space for those, but if you choose that path and are willing to experience a steep learning curve you can download it here.

Because it's based on a supervised ADMIXTURE analysis. These have to be interpreted with some caution.

Yes, I agree that it has to be used with caution.
My data analysis provides only a rough guide to what is the best fit from the limited range of samples that we have.
However, what it does indicate is that the Yamnayan samples that we have (in combination with a variety of other ancient samples) provide such diverse readings to Bell Beaker and Corded Ware that they are clearly not the best explanation as their major genetic contributor. The 'Steppe' components within BB, CW, ATP3 and Vucedol match much more closely with preceding Steppe people (Khvalynsk), especially those that appear to have already been present in Bulgaria by the 5th millennium BC admixed with Anatolian/EEF.

Yes, I agree that it has to be used with caution.
My data analysis provides only a rough guide to what is the best fit from the limited range of samples that we have.However, what it does indicate is that the Yamnayan samples that we have (in combination with a variety of other ancient samples) provide such diverse readings to Bell Beaker and Corded Ware that they are clearly not the best explanation as their major genetic contributor. The 'Steppe' components within BB, CW, ATP3 and Vucedol match much more closely with preceding Steppe people (Khvalynsk), especially those that appear to have already been present in Bulgaria by the 5th millennium BC admixed with Anatolian/EEF.

If you can replicate that with D-stats or f3stat or qpAdm, yes. But it could also simply be a artifact.

What markod says. Also, ADMIXTURE is a quite rough approach to genetic mixture, especially if you mix older populations with modern day populations. Also, to fully appreciate what it states you need to weigh in other K values as well. ADMIXTURE goes a bit like this: When *forced* to be modeled as combination of two of the provided samples how would they look (That is K=2), and when *forced* to be modeled as a combination of three samples (K=3), etc etc. Afterwards the statistically best fit is chosen, if run in an unsupervised mode. Even in the unsupervised case a lot of individual samples will simply be a forced bad fit.

The tools that Reich labs created (f3stats, D-stats and ADMIXTURE) are available but rather complicated to use. Also, you'd probably need the full samples which take up a huge amount of disk space for those, but if you choose that path and are willing to experience a steep learning curve you can download it here.

Thanks for this.
Does it produce substantially different results, do you know? And if so, what are they?
The striking thing about the autosomal analysis I have carried out is that it provides pretty similar results to yDNA and mtDNA analysis that I have undertaken using different methodologies.

Thanks for this.
Does it produce substantially different results, do you know? And if so, what are they?
The striking thing about the autosomal analysis I have carried out is that it provides pretty similar results to yDNA and mtDNA analysis that I have undertaken using different methodologies.

I don't know, I do know that the 30% EEF + 70% Yamnaya for Corded Ware pops up in many different approaches. but do take into consideration that more than one model may fit as different proposed source population are themselves related to each other. For instance, Khvalynsk is often considered partial ancestor of Yamnaya and thus may mob up lots of Yamnaya ancestry.

I don't know, I do know that the 30% EEF + 70% Yamnaya for Corded Ware pops up in many different approaches. but do take into consideration that more than one model may fit as different proposed source population are themselves related to each other. For instance, Khvalynsk is often considered partial ancestor of Yamnaya and thus may mob up lots of Yamnaya ancestry.

I did calculate the fit of all possible models, including potential source populations that are related to each other and combinations of these source populations. Khvalynsk provided a fit for early Steppe DNA in EC Europe that was substantially closer than its Yamnayan successor or indeed a mixture of the two.

As Khvalynsk evolved into Yamnaya, its autosomal changes moved it further away from the Steppe DNA that we see in core European populations. Similarly with EEF, the further we move from Anatolia towards the Corded Ware zone, the worse the fit that we see with the EEF in Corded Ware populations, suggesting that the EEF component in CW came from elsewhere.

The North Ukrainian R1a-M417 sample dated circa 4,000 BC provides the most striking evidence - it already had both the yDNA and autosomal mix typical of Corded Ware before Yamnaya had even come into existence. Its descendants did not need any Yamnayan admixture to turn them into what it seems they already were.

Unless there is evidence to indicate and explain why these best-fit results are likely to be incorrect, I would provisionally tend to go with them, rather than with possibilities that provide worse fits. Of course, we are only looking at a limited number of clusters of archaeological samples, and the real lineages of Chalcolithic populations are highly likely to be with people and communities for which we have no samples or data.

Unless there is evidence to indicate and explain why these best-fit results are likely to be incorrect, I would provisionally tend to go with them, rather than with possibilities that provide worse fits. Of course, we are only looking at a limited number of clusters of archaeological samples, and the real lineages of Chalcolithic populations are highly likely to be with people and communities for which we have no samples or data.

ADMIXTURE is not a good source and any model you get from it the way you do is pretty much useless unless replicated with another method. The big papers basically do an ADMIXTURE, verify with f3 or D-stats, and model with qpAdm. You may try and use your method and verify with nMonte, if you insist. Also, what you call a dataset is not that. It is basically *one* ADMIXTURE-run.

EDIT: One of the reasons ADMIXTURE is not very reliable the way you use it is, it uses Fst (genetic distance). That is fine, but differences in calculated genetic distance can be caused by more than simple ancestry. For instance, two populations (A and B) both highly drifted from a population C will show a higher Fst from C each than a population resulting from a merge of A and B. That is because unique drift will be leveled out once A an B merge, as both having been separated as long as they have been separated from C, have some genes than weren't drifted in A merging with genes drifted in B. This is why current day Europeans have a lower Fst from Africans than any of its ancestors have. However, check with D-stats and no African population will choose current day Europeans over any of its ancestors, which is a clear sign that the difference in Fst is not from extra African ancestry.

Yes, it could be; but, from the archaeological samples available, there were no fits so close as the ones I've identified. The matches with other core Pontic-Caspian samples like Sredny Stog and Yamnaya provide more divergent results; and the matches with EEF further away from the Bosphorus likewise.

The best-fit results actually match up well with what we know about the Steppe-like Suvorovo culture, which appears to have spread from Eastern Bulgaria between 4,300 and 4,000 BC in various directions northwards - to the Danube delta, and then (i) up the Danube into Northern Romania, (ii) up the Dniester into North Western Ukraine and (iii) up the Dnieper into East Central Ukraine.

I see, but, I mean, how likely is it that a R1b BB and a R1a CWC from, say, around 2500 BC would still have a very close match with a 5th millennium Bulgarian individual even after so intensive migrations, cultural changes and certainly lots of mixing with the local peoples (who, especially in the case of BB, they didn't seem to replace overwhelmingly)? Would very little mixing and autosomal change have happened in more than 1500 years even as the Bulgarian Suvorovo spread to lands very far away and already densely inhabited? I'd be very surprised if that did happen. I think the fact that the BB and CWC do not match as well as with the Sredny Stog and the Yamnaya samples may just result from the very likely and plausible fact that those were still "steppe proper", pre-expansion societies with much less admixture with the ANF+WHG mix dominant especially to the west of the Dniester.

I've seen some suggestion previously that Cernavoda, under Suvorovo-Novodanilovka influence, may have had some role in the spread of IE languages, but its very early dating and split from the steppe "proper" genetic/cultural horizon made people speculate it could have something to do with the Anatolian IE languages, because the non-Anatolian Late PIE stage has usually been assumed to have started splitting much later, around 3400-3000 BC. But they could be wrong, of course...

Never cite that quack Quiles. The crackpot believes that Corded Ware was Uralic; ridiculous. How about Yamnaya being Vasconic/Northwest Caucasian? See, I can come up with crank theories too!

It wouldn't be that huge a problem if he didn't also think that CWC had come from the Pontic-Caspian steppe (early Sredny Stog) where, just before the Neolithization of the region around 5000-4500 BC (approximate dates), PU and PIE would've formed a common homogeneous Indo-Uralic language that split later into the Khvalynsk PIE and Sredny Stog PU. That is, he really believes that PU and PIE were basically separated by just 1000-1500 years of linguistic divergence when they themselves started to split to form their own language families. :-o

Does he actually? That is really dumb, how can he believe that given the blatant correlation with Y DNA N1c?

Believe it or not, he thinks N1c and Siberian ancestry have nothing to do with the PU expansion. They're just later absorptions that took place in some Uralic-speaking areas, little more than a faint correlation. On the other hand, he thinks there is a strong correlation between PU and CWC and R1a-M417 (including the "Indo-Iranian" Z93, which according to him makes the Proto-Indo-Iranian community a mix of Yamnaya-derived PIE with Uralic CWC). It's a bit strange, given his clear knowledge about all the papers, that this really basic reasoning was missed by him: 1) with the exception of the clear outliers of the PU family (with a very "unusual" history, too), the Hungarians, all Uralic nations are rich in N1c or at least N1 and have at least some minor Siberian ancestry, but very few non-Uralic populations in Europe have a high frequency of N1, and all of them are neighbors to Uralic nations (what a coincidence); 2) and that CWC is found in heavy proportions in virtually all Uralic nations (Nganasans excluded), but CWC is also found in heavy or actually even heavier proportions in several non-Uralic nations, whereas Siberian ancestry is clearly found in stronger proportions in the Uralic nations than in other nations (even the N1-rich IE people like the Lithuanians have virtually no Siberian ancestry).

It wouldn't be that huge a problem if he didn't also think that CWC had come from the Pontic-Caspian steppe (early Sredny Stog) where, just before the Neolithization of the region around 5000-4500 BC (approximate dates), PU and PIE would've formed a common homogeneous Indo-Uralic language that split later into the Khvalynsk PIE and Sredny Stog PU. That is, he really believes that PU and PIE were basically separated by just 1000-1500 years of linguistic divergence when they themselves started to split to form their own language families. :-o

I believe the traditional argument that had Uralic & IE bordering each other on the steppe usually involved the weird IE forms in PU (*nimi– , *weti– etc.).

How can these be explained if Uralic expanded from a homeland presumably in the vicinity of Mongolia (under the Seima Turbino hypothesis)? They don't seem like words that would diffuse through trade or the like.

ADMIXTURE is not a good source and any model you get from it the way you do is pretty much useless unless replicated with another method. The big papers basically do an ADMIXTURE, verify with f3 or D-stats, and model with qpAdm. You may try and use your method and verify with nMonte, if you insist. Also, what you call a dataset is not that. It is basically *one* ADMIXTURE-run.

I think 'useless' is exaggerated, particularly as it is often verified with f3 or D-stats. In the absence of evidence to the contrary, it is at least better than nothing. It may be one admixture run, but it is a dataset in that the data has been obtained from many samples.

Originally Posted by epoch

EDIT: One of the reasons ADMIXTURE is not very reliable the way you use it is, it uses Fst (genetic distance). That is fine, but differences in calculated genetic distance can be caused by more than simple ancestry. For instance, two populations (A and B) both highly drifted from a population C will show a higher Fst from C each than a population resulting from a merge of A and B. That is because unique drift will be leveled out once A an B merge, as both having been separated as long as they have been separated from C, have some genes than weren't drifted in A merging with genes drifted in B. This is why current day Europeans have a lower Fst from Africans than any of its ancestors have. However, check with D-stats and no African population will choose current day Europeans over any of its ancestors, which is a clear sign that the difference in Fst is not from extra African ancestry.

If I understand you correctly - if C's descendants A and B later merge, rather than admix with other populations, then they are likely to show a greater proportional descent from C. This is what I am measuring, rather than separation times.