search this blog

Wednesday, March 1, 2017

qpAdm tour of Iberia and France

Next year we'll be seeing a major paleogenetic paper on the population history of Iberia (more info here). That's too long to wait, so let's preempt some of the results. In Iberia, Steppe_EBA ancestry seems to peak in northern Spain, Portugal and, interestingly, on the Baleares Islands.

Some, like spanish Murcia, have 9% between Nganasan, Onge and Yoruba, that's a lot of noise, even if you assume that's some of it is real. I was also skeptical about the Siberian admixture in the Baltic, looks that qpAdm tends to pick up all kind of signals. I was also curios about the "low" Steppe EBA in southern France, if it was the case why do they cluster east of Spain in most PCA? Compared to most of Spain Southern France have some extra WHG and that's it, very different results from what we saw elsewhere.

The main difference between Iberian populations seems to be in swapping WHG for Steppe_EBA. I wonder if that may be an artifact of the reference samples being related to one another. Probably not as it makes sense to an extent.

I think that's telling us something interesting though - the primary difference between Basques and their neighbours is that they swap some Steppe for WHG.

Suppose for a second there are two migrations that bring steppe ancestry - one migration of people very much like what we're used to for Yamnaya and such, and another that's got an inflated WHG component and consequently lower EHG and CHG. We could even call the second group Bell Beakers since they seem to fit the bill.

So you have the Basques who were affected more than other groups by the Bell Beakers, and less than other groups by later IE waves.

On the other hand, the rest of NE Spain gets hit just as hard as the Basques with Beaker ancestry, but just as hard as everyone else by later IE waves too. So the "extra" Steppe ancestry in Pais Vasco is actually because the pre-IE populations there had greater than normal Steppe ancestry to begin with thanks to Bell Beakers.

Thoughts?

I wonder if you pooled R1b/R1a frequencies for each group how closely it would track to Steppe_EBA and Western_HG.

Interesting, I had always believed that the French Basques are the purer ones. But here it looks very much like the Spanish Basques are the purest. They have the second lowest Steppe_EBA of all Iberians and at the same time the strongest WHG. Thus they cannot have got their high WHG by admixture with another population. And a low Steppe_EBA would make sense in light of their non-IE language.

The comparatively high Steppe_EBA in the Cataluna-Valencia-Baleares corner is also curious, because it used to be non-IE Iberian speaking in pre-Roman times. But archaeologically attested there is some Urnfield influence especially in Cataluna, it might be from this. But they were probably IE and assimilated by the Iberians.

According to Francisco Villar, proponent of the Late Basquization theory, the region of Catolinia and the Pais Vasco are precisely the areas of oldest IE settlements in Iberia, as attested by alteuropäische place names.

@Simon_WThere seems to be swapping between Steppe_EBA and Western_HG as Ryan pointed out.The basque french, basque spanish and spanish pais vasco are all the same ethnically basque population, and they always cluster together in PCAs.Also notice how low Galicia scores in steppe but high in HG, while the reverse is true for Portugal.

"The comparatively high Steppe_EBA in the Cataluna-Valencia-Baleares corner is also curious, because it used to be non-IE Iberian speaking in pre-Roman times."That's true and if you notice in general the higher steppe ancestry and indo-european ydna a population within Iberia has, the later they started speaking an indo-euro language.The explanation I've read about a very high ratio of R1b males who took natives wives and their kids learned the language from their mother doesn't seem very convincing to me.

@DavidskiWhat's the dataset you use and how many samples does each population have? Thanks in advance.

Aragonese picked because they seem high WHG for Spain. Any of the other samples would have lower WHG.

(I know this probably seems a small thing to talk about, but small differences in the HG amounts->large differences in PCA positions. There are relatively differences in position of Iberian_EN vs Iberia_MN on PCA and that is only 10-15% WHG ancestry difference.)

Btw, when looking at Balkans with various measures (disclaimer: admittedly *not* qpAdm), I found that they seemed to prefer almost unadmixed Anatolia_Neolithic+Steppe_EMBA (almost as much Steppe as typical with Northwest Europe when allowed to fit with almost unadmixed Anatolian) and little additional WHG.

I don't know if the 'swap' has any particular meaning, maybe it's mostly the algorithm picking one over the other because either the WHG or Lengyel references aren't the best? Don't think the Bell Beakers have anything to do with it. A few years ago Maju posted a map on his blog regarding BB in Iberia and it looked like this: http://2.bp.blogspot.com/-VMQfayP7ljs/T7vpTvToIeI/AAAAAAAAA8o/bMnqHi_dInU/s1600/BellBeakerIberia2.pngAs you can see there's no correlation between the values of WHG and the geographic distribution of BB findings. The Portuguese in particular show ridiculously low amounts of WHG (3,8%), even the Castilian-Leonese and the Manchegos score 6,1% and 6,9% respectively, so I don't think that explanation is good enough.

@all

From what I've seen in other calcs East Iberians (Catalonia, Valencia, Vascongadas, Aragón) have higher HG/Steppe and lower EEF than West Iberians (Galicia, Extremadura, Portugal, Castilla y León (at least part of the, which I assume are the Leonese). Cantabria seems to go with the eastern ones, probably due to their historical proximity and relation with Vascongadas. That's not strictly true here, but kind of if you add Steppe+WHG.

Still, some results are odd. Was anyone expecting the Portuguese to have a Steppe amount closer to Eastern French than to Galicians? I certainly wasn't, I generally take Galicians as a proxy for northern Portugal and Extremadura for the southern part. The higher-than-usual SSA also struck me.I guess I could get behind Portugal having higher Steppe due to recent history, particularly the Suebi settlement which was mostly done there (between the Douro and Minho, although there were known Bishops with Suebi names as "south" as Viseu), the immigrants brought in to populate the depopulated countryside from other reaches of western Europe, notably Burgundy, and perhaps knights and clergy during 1100-1200s...but I always thought those to be kind of irrelevant in the grand scheme of things, and not unique to this part of the peninsula.

@David

OM has a point, we could help with labelling if you want, for us checking the differences between North and South are very interesting. Although where one ends and the other begins is not consensual.

I wouldn't take those differences very literally. Modeling using formal stats is noisy, and for very minor reasons it can give results that might not reflect something real. Here I run those populations using the most basic populations/components (from the Global 10 PCA data) to check for overall differences, and all Iberians (except Basques, that as expected have more WHG and less ANE), they're within 1-3% variation:

This is not to say that if I used Yamnaya, CHG, Bell Beaker, Iberia_EN/MN/ChL, etc... there wouldn't be bigger differences in the preferences, and maybe some of they reflecting some sort of reality. But it could also be some amount of fake results for not having the specific right references or quality of ancient genomes or whatever.

And yah, to be clear I'm assuming that western R1b = Bell Beakers (or Proto Bell Beakers) = Vasonic languages, and that Iberian Bell Beakers (at least initially) were similar genetically to German ones. Lots of assumptions, but I think it fits with everything we know so far.

I was interested in knowing if the Spanish_Cataluna samples were native ethnic catalans or included immigrants from other parts of Spain. Couldn't find any info about it though, but thanks anyway Davisdki.

@André de Vasconcelos"Was anyone expecting the Portuguese to have a Steppe amount closer to Eastern French than to Galicians?"I didn't either, specially because the difference is 1/3 more steppe ancestry in the portuguese.And I don't think germanic migrations explain this because from what I've seen germanic ydna is slighly higher in Galicia.

The same can be said about Spanish_Pais_Vasco having 1/3 more steppe than Basque_Spanish, even though they're the same ethnicity from the same region and probably only a few km apart.I can't recall of any historical event which explains this.Odd indeed.

Is because those samples (n=5) are from Residents in the Spanish Basque Country and not necessarily native people. Notice that in the PCA above they cluster with French_South which are Gascons who used to be Aquitanian speaking but lost their language. Therefore I think the Spanish_Pais_Vasco are probably part-Non-Basque Spanish and that's what pulls them West. I believe the Basque_Spanish sample was selected using a stricter criteria that they had to have all Basque surnames and all 4 Grandparents born there.

One can see the difference between Native Basques and Resident Basques in the latest R1b-DF27 study involving Basques:

I wonder if there were two different waves of R1b DF27 into Iberia with one moving through Italy or taking a Coastal route from the Balkans (Southern part of the Vucedol Culture) and the other migrating with the Majority R1b L21 from +-Hungary (Northern part of the Vucedol Culture) to Germany and the Northwestblock and then to Spain ?

No at all, even in your PCA southern France, all France for that matter, is between Spain and northern Europe. Than it clusters eastward, towards EHG, and bronze age groups. Easy to imply that with other methods southern France has more (or way more) bronze age admixture than everywhere in Spain or Portugal. Hard for me to take this test too seriously, when for years we saw hundreds of very different results, including every paper in which they show admixture from bronze age groups, these results are very different. Too me qpAdm can properly distinguish WHG and Steppe, it seems pretty obvius when virtually identical Iberian populations are differing so much. I'm sure that in the past you've done tests that are in line with what I'm saying.

I think that the matter with the Spanish_Pais_Vasco samples is that they come from native basques but from a more southerly location than the other basque samples (maybe from Álava?). Since the Basque country is such a tiny territory with a sharp genetic differentation between its neighbouring populations there must be a marked cline.

My doubt is still why some regions that should be very similar to each other have such different steppe and HG levels. If indo-europeans really caused a greater impact in some regions, their Lengyel_LN should be reduced just like Western_HG is, but that doesn't happen.Compare how while Portuguese has way more steppe ancestry than Spanish_Galicia, their Western_HG much lower but the opposite happens with Lengyel_LN.Maybe Bell Beakers in Iberia had higher neolithic ancestry than BB from central Europe. We will know with the incoming big paper.

But couldn't really be the calculator mistaking some steppe for HG and the other way around? After all the whole Iberian peninsula is very similar genetically, so anything that's a bit off would seem like a big change.

@Ric Hern

I also think that there was more than one migration wave. There is something we still don't know that explains why the places that later picked up an IE language are the ones with higher R1b and steppe ancestry.

The PCA essentially confirms the qpAdm results for the Portuguese. They're more southern and more eastern than the Spaniards, implying a lower level of Western_HG and higher level of Steppe_EBA or something like it.

So if this is out of whack with reality, then it must be caused by sampling bias and not qpAdm.

Ric, I don't think we can make such conclusions (or educated guesses) simply based on the PCA, especially when no samples from such ancient populations have been tested.

I suppose David could be right, and that sampling bias might be a thing, maybe they are mostly from, say, Castelo Branco/Guarda, since those locations are close to Extremadura/León respectively, and both regions show relatively high amounts of Steppe when compared to Galicians. Their results do fit the PCA. I'm also curious about how Asturians would show as, are there any samples that could be run?

@Josep

Spanish_Pais_Vasco are Spanish speakers in Vascongadas, which could mean they are immigrants from other places in Spain, locals who were never Euskaldun, native Basques who became Spanish speakers, or a combination of these. I agree with you, it's still a bit surprising they have higher Steppe than their close neighbours, as happens with the Portuguese. I'd like to read what Maju has to say on this.

@Ric - Re: route of R1b to Iberia, if you look at the distribution of subclades it seems pretty likely that L23 followed the Danube through the Balkans, and then radiated out from the headwaters of the Danube, with U106 as the northern/eastern Rhine branch, and and S116 as the southern/western branch.

@Davidski As far as the Yoruba column goes---this has got to be associated with Maghrebi/"Berber" admixture. Given that Berbers are ~20% SSA, a lot of those populations should be in the ballpark of ~10-15% NA admix. On your last qAdm, you ran Armenia_Bronze(which accounts for extra CHG) to see how the fit improves with impressive results. So, by analogy, would you be able to do the same here but with Mozabite or other modern NA genomes?

(Unfortunately we don't have medieval or ancient NA genomes at this time.)

@truth Indeed, seems it is inflating it. The relative %'s do make sense, somewhat, though. In any case, there's a good chance that running qpAdm with actual Maghrebi genomes on hand will improve statistical fit.

25-30% seems a bit high for northern Maghrebis. This range is characteristic of South Moroccan groups and those living on the edge of the Sahara, methinks----in places like Tafialt and Figuig.

See, this is why I was a bit surprised to see that Catalonia does not top the list comfortably. Perhaps there is some strong microvariation there(western hills vs eastern flatlands). Or perhaps major migrations from other parts of Iberia didn't just take place in the 20th century(Few Catalans have all four Catalan grandparents).

But yes, Catalonia despite being designated as an Iberian area(as opp to Celtic or Celtiberian) in antiquity shows strongest affinity from all peninsular regions to Iron Age cultures of Central europe(like La Tene) as evidenced by sword finds there.

To add to this---- I'm not particularly fond of the "phenotype classification" crowd and how they go about their "classifying", but there is something to be said about Catalan cranio-facial trends in relation to the rest of the peninsula. It's generally easy for me to distinguish a group pic of native Catalans from a group pic of other Iberians like Castillians, Andalusians or Galicians while the three examples given are often easily confused, imo. I can't really put a finger on the traits that, on average, are more frequently found in Catalans but they often resemble, say, Occitanian, French or N.Italian folks than other Iberians. Yet another reason I'm surprised about their steppe DNA scores.

I wonder how Languedocians and North Catalans from Rosello, their northern neighbors, score.

"Catalonia despite being designated as an Iberian area(as opp to Celtic or Celtiberian) in antiquity shows strongest affinity from all peninsular regions to Iron Age cultures of Central europe(like La Tene) as evidenced by sword finds there."

Because of the 'espada de antenas'? They were mostly from the Meseta, not Catalonia, eventhough they were also present amongst Lusitanians and Gallaecians. The cultural package of "Iron Age Catalans" clearly did not have a stronger affinity to La Tene than IE-speakers.

I've been to nearly all regions of Spain in over 20 visits - planning to do the next in a few months if all goes well - and I cannot tell Catalans apart from Castilians, Valencians and whatnot (Basques are a different story). Other Spanish have told me the same. In fact, not once have I been taken for a foreigner until I opened my mouth and "kind-of-broken Castilian" came out.

Andre, I was referring to this article: http://www.academia.edu/728177/_Patterns_of_interaction_Celtic_and_Iberian_weapons_in_Iron_Age_Spain_. It certainly does highlight Catalonia.

As for distinguishing----Catalonia was heavily influenced by internal migrations thanks to its booming industry over the last 80 years or so. This can explain what you said. Moreover, I don't want to exaggerate distinctions between Catalans and other Iberians. There's a high degree of overlap. At the same time they have more overlap with "Gauls" than , say, Galicians or Asturians, I find.

Ric as Andre already implied, such conclusions purely based on PCA are super risky to make. Consider how Egyptians and Saudis occupy a close location on the PCA. Does that mean Egyptians are descended from Arabians? Clearly not. PCAs are quite informative, however, when it comes asessing similarity in the more basal ancestral components. In this case Saudis and Egytians do share Levant_Neo ancestry(in turn comprised of a certain ratio of VHG-like and BE) as well as comparable SSA fractions. This explains why their statistics on the PCA localize them to a similar position.

Interesting read, thanks. I didn't know that Catalonia during the iberian period was influenced by la Tène. I guess it makes sense from a geographical point of view, because if anything entered Iberia from Gaul or central Europe during those times there is a high chance that it reached Catalonia first.

But I don't think these "cultural" exchanges had any effect on the genetic make up of Catalonia. I see mixed signals, on one side on most calcs Catalonia is on top of steppe/northern euro-like admixture in Iberia (not on this that's why I asked the source of the samples to check if they are really native catalans), but on the other side almost all the R1b is the DF27 iberian variety, and the identity and language of the people living here during those times weren't celtic/IE. Whoever were the first to bring steppe ancestry (probably BB) in the peninsula were the ones that made the most impact in Catalonia and the northeast in general, and later cultures such as Hallstatt and la Tène affected the western half.

It's true that milions of castilian immigrants came here during the last century to the point that now we're a minority, and sadly in many studies they just sample anyone born in Catalonia instead of ethnic catalans.As for what you said about being able to distinguish natives from the others I can too, but I guess that's something only a local knows, in a way that you can recognize your own people among outsiders. It's hard to explain this, but in general despite all the overlap there are more northern looking types among ethnic catalans and more typically mediterranean phenotypes among castilian immigrants. The way people carry themselves gives clues too.

I guess the simplest explanation for this is like I said before a matter of geography. Catalonia being in the northeastern corner of the Iberian peninsula makes its people the most similar to "Gauls" as you say. Since indo-europeans spread from the Corded Ware horizon, the further you go from there the more steppe admixture diminishes. If you notice the same pattern is true for most countries in Europe, where its northeastern part carries the highest IE ancestry.

@David Nice spreadsheet! Thank you for doing that. Moroccan admixture gradient corresponds well to frequency of E-m81. It really seems to give credence to what Andre and I were thinking: that NA admixture seems to be rather ancient in Iberia because it follows a geographical gradient opposite to that of Islamic occupation. This in turn supports the idea that Muslim Iberians were overwhelmingly of native(Muladi) stock as was the case in eastern Islamic world. Averroes was likely as Iberian as Avicenna was Bactro-Sogdian! Now, a couple of things I wanted to remark at:1)I see a clear gradient of west-east as far as "Moroccan" admixture goes, but the absolute magnitudes seem a bit high. Have you considered trying Mozabites, for ex?2) Onge---can this be throwing things off at all?3) If you have time and desire, would you be so kind as to "complete" the Balkan spreadsheet Davidski posted a couple of months ago using either Nmonte or qpAdm? Would be cool to see how Serbs, Montenegrins, Bosnians and Croats score on BronzeAgeArmenia. You totally don't have to though. :-) So don't feel compelled.

I experimented with the Mozabites using the same set up. The Yoruba signal disappeared entirely and the Mozabite admixture was usually at around 10% for the Iberians. Otherwise the results were basically the same.

The Onge signal remained for most of the groups, and helped to improve the fit. I don't really know what it means, but the algorithm seems to find it useful.

I don't recommend trying to copy qpAdm models with nMonte, unless the data being fed into nMonte is based on formal stats, like D-stats.

If you're feeding nMonte PCA or ADMIXTURE data, then you need more proximate reference samples, because PCA and ADMIXTURE are affected by recent drift, so if you're missing an important reference sample, like North Africans in the case of Iberians, you might see pretty crazy results.

Davidski: «Please note the position of the Portuguese set, between the Spaniards and North Italians from Bergamo. This is in line with their low Western_HG ratio in the qpAdm run.

The Portuguese I have are very homogeneous, so there's not much point in splitting them up. I can't anyway, because I don't know where in Portugal they're from.»

First of all this «Neolithic» component is based on what? It´s the same for the Early European farmers? If it´s the same than around 44% of it, is of hunter-gatherer ancestry(Basal Eurasian which is older than WHG). Probably, like all the others that I know, is also based on the assumption that all H mtDNA in Europe came with the Neolithic (which is clearly false).

Second, if you don´t know from the Portuguese samples came from (homogenous or not, that´s irrelevant to assess how representative it are), then it´s irrelevant to use and discuss it. It could even have diverse origins than the native Portuguese. It´s also ridiculous how for Basques, for example, people that talk with me almost know all the villages from where the samples came from(and the same for Spain and Italy).

Third: There should be more hunter-gatherer ancestry in Iberia, than just WHG (like Basal Eurasian, but not only). To build the WHG component only few Mesolithic individuals were used (La Brana was almost Neolithic, btw), but Paleolithic individuals never were used. Ibero-maurussian components, for example, are much older and have been mislabelled as pathetic «moor» ancestry by some people. Even as SSA admixture for some very old haplogroups like L3, L2 and L1b. These in fact constitut evidence of the oldest maternal background in Europe and later studies have confirmed that the majority of this minority in Iberia, have actually a genetic background related with a pre-neolithic movement, back and forth from South Europe to North Africa and vice-versa. It´s actually so old, that Negroid didn´t even existed back then and it´s way more akin to Homo sapiens movements out of Africa. Some Taforalt samples had L3, for example, and the skulls were Caucasoid. These old L lineages are the ancestors of M, U, H, and so on.We already know that the majority (2/3) of the L mtDNA in Iberia came before the Neolithic (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0139784), and these estimations are still somewhat conservative in my opinion, like those given for U6, which was found recently as much older in Europe (Romenia-Aurignacian) than has been previously thought.