Monday, September 5, 2011

'bat' calculator (Balkans-Anatolia-Turkic)

I have decided to make a new calculator for DIYDodecad that may be useful for individuals from the Balkans and Anatolia. You can download it from here at Google Docs (or here from sendspace). The terms of use are the same as for DIYDodecad v 2.0. To run it, you simply extract the contents of the RAR file in your working directory, and type bat.par whenever you typed dv3.par in the instructions.

The reference populations can be seen below. I have included all available Balkan populations, as well as Turks and Armenians. Moreover, I have included all available Turkic populations.

The marker set is the same as used in Dodecad v3. Three components emerge: one centered in the northern Balkans, one in eastern Anatolia, and one present in various proportions among all Turkic populations (see Turkic cline).

The components have been named accordingly, but please note that they do not necessarily reflect recent ancestors. For example, it is a good hypothesis that the Anatolia component was present in the Balkans even in ancient times, so one need not seek a recent Anatolian ancestor to explain its presence in a Balkan individual. Similarly for the Balkans component in Anatolia, which may reflect the diverse Balkan peoples that have settled in Anatolia since the dawn of history, so a present-day inhabitant of Anatolia need not seek a recent Balkan ancestor.

Likewise, the Turkic component is only part of the genetic makeup of the Turkic speakers who arrived in Anatolia, since those probably also carried West Eurasian population elements picked up en route from Siberia to Anatolia.

The way to interpret your results is to see whether you have an excess or deficiency of any component relative to your ethnic group. For example, an Anatolian Greek may have a higher Anatolia/Balkans ratio than a Balkan Greek and likewise for a Balkan vs. Anatolian Turk; the latter may also have a variable Turkic component which will reflect differential Central Asian input.

35 comments:

If bat.rar is opened in the same folder as DIYDodecad2.0, it works fine. Otherwise, if you open it in a separate folder, it requires "standardize.r" which seems to be missing in bat.rar, and even if you include it, there does not seem to be any output.

As I see it is an unusual result for the Balkans, maybe because my ancestry is from north of/to the Balkans, Bucovina being more Central/Eastern European than Balkanic

Your result is what is expected because you are from the north end of the North Balkan-East Anatolian cline.

Do those seem consonant to you?

Well, the distinction between the Balkans and Anatolia is multifaceted. Greeks have more "Med" than Anatolians, while people from the North Balkans have more "W/E Euro". Anatolians have more "W Asian". So, overall there are 3-4 major players in the equation, and 'bat' folds them into a single cline.

Is there any breakup of "Balkans, Anatolia, Turkic" that would not be surprising for a person from India, as that region is not exactly the target audience for the calculator?

Is there a reason why my Anatolian component is higher and Balkan lower than the average Turk?

Averages are just that: averages. Some Turks score higher and some lower than the average. People with ancestry from the Balkan provinces of the former Ottoman Empire would tent to score higher "Balkan", those from the more eastern parts of Anatolia lower.

Well actually I have seen results of other Turks on forums, whom are not from the Balkan provinces of the former Ottoman Empire. Like from Eastern or central Turkey yet they still scored around 20% Balkan, so far I scored the most Anatolian from what I have seen.

I have a suspicion that the West_Asian component in the Laz, or indigenous northeastern Anatolians in general.

For example, your West_Asian in dv3 is 67.5, which is slightly less than the average in Georgians, but if your Turkish grandparent deviates towards the average of Behar et al. Turks (45%), or Armenians (50-57%), then the Laz part of your ancestry may very well be something >70% or even close to 80%.

If you ever run across a full Laz individual, it would be worthwhile to compare against. Plus, we will hopefully have new Caucasus data from the new Yunusbayev et al. paper, so things may be even better defined in the coming year.

If you ever run across a full Laz individual, it would be worthwhile to compare against.

Also Pontian Greeks, as the great majority of northeastern Anatolians were Pontian Greeks rather than Laz before Islamization/Turkification, so northeastern Anatolian Turks in general must be much more descended from Pontian Greeks than they are descended from Laz. One of Dienekes' full Pontian Greek paternal relatives (Dienekes' paternal side consists of Pontian Greeks) would be nice to test.

BTW, b/o (you are basar, aren't you?), I have been corresponding via email with Mait Metspalu, one the authors of both the newly published Yunusbayev et al. 2011 paper and the Behar et al. 2010 paper, for some months. In our correspondence Mr. Metspalu informed me that all of the Turkish samples used in Yunusbayev et al. 2011 are exactly the same samples as all the Turkish samples used in Behar et al. 2010 (19 in total) and, much more importantly, that all of the Behar et al. 2010 Turks (consequently also all of the Yunusbayev et al. 2011 Turks) were sampled from the region of Turkey that is historically known as Cappadocia. This confirmed my suspicions, as ever since the publication of the Behar et al. 2010 paper I had been suspecting that the Turks used in Behar et al. 2010 were from a rather limited region of Turkey, and their contrast with the genetically much more heterogeneous Dodecad Turks, who are from all over Turkey, had increased my suspicions. Now my suspicions are confirmed by Mr. Metspalu. So, as Dodecad Turks are from all over Turkey while all the Turks used in the Yunusbayev et al. 2011 paper and the Behar et al. 2010 paper are exclusively from the historic Cappadocia region, the Dodecad Turks are much more representative of the ethnic Turkish genetic variation than the Turks used in the Yunusbayev et al. 2011 paper and the Behar et al. 2010 paper.

According to the Dodecad Project's standard ADMIXTURE analysis results, the most noticeable differences between the Dodecad Turks and the Yunusbayev et al. 2011's/Behar et al 2010's Turks are that the average of the sum of the Mongoloid component percentages is 5.2% for the Dodecad Turks (as of now 23 samples in total) and 6.9% for the Yunusbayev et al. 2011's/Behar et al 2010's Turks (as I said, 19 samples in total), that the average of the South Asian component percentage is 2.3% for the Dodecad Turks (less than that of the Dodecad Armenians [20 samples in total as of now], which is 2.8%, and almost equal to that of the Dodecad Assyrians [12 samples in total as of now], which is 2.2%) and 3% for the Yunusbayev et al. 2011's/Behar et al 2010's Turks, and that on average the Dodecad Turks have more West European, more Mediterranean, more Southwest Asian and less West Asian component percentages than the Yunusbayev et al. 2011's/Behar et al 2010's Turks. Also according to the Dodecad Project's standard ADMIXTURE analysis results, the average of the sum of the Mongoloid component percentages of the Dodecad Turks (5.2.%) is very similar to the average of the sum of the Mongoloid component percentages of the HGDP Adyghe (5.5%), for whom Mr. Metspalu informed me that they are at the same time the Adyghe samples used in the Yunusbayev et al. 2011 paper and the Behar et al. 2010 paper by the way.

Onur, the Dodecad Turkish samples are not much more representative of the ethnic Turkish genetic variation since a part of them have mixed ethnic background. Lately I've spoken to a few Dodecad Turkish participants in 23andme and they all turned out to have mixed ethnic background. I don't know of these Turks are included in the Dodecad Turkish samples but I do know of at least two Turks who are included but have mixed ethnic background. So I would be careful with dismissing the Behar samples and labeling the Dodecad samples as 'a good representative of the ethnic Turkish genetic variation'.

The question of representativeness is a side issue, because Turks do not form a homogeneous population. This is to be expected, as Turks (even ethnic Turks) are from a large geographical area and have diverse origins (converts to Islam of various ethnic origins, intermixed with the original Turkic invaders in various proportions). For example, there are self-identified ethnic Turks with near 0% of the Asian components, as well as those with >15%, and all are included or not with the same uniform inclusion policy.

Onur is right in that the Behar et al. Turks are not representative of Turkish variation, since they come from a small part of the Turkish world: to claim otherwise would be equivalent to saying that the Portuguese are representative of Southwestern Europeans, or that Romanians are representative of the Balkans.

All the Dodecad Turks have 4 Turkish grandparents either in the ethnic sense (e.g., Turkish Cypriots or Balkan Turks), or in the civic sense of having 4 grandparents from Turkey that cannot be assigned to other ethnic populations (e.g., Armenian_D, Greek_D, Kurd_D, etc.) This is an adequate policy to obtain a broad cross-section of people normally calling themselves, and being called "Turks", and that is all I claim for the Turkish_D sample.

If, in the future, there are adequate sample sizes to split the Turkish_D sample along geographical or cultural lines (e.g., Alevi Turks, Balkan Turks, Turkish Cypriots, etc.) then it will be done.

Does Dienekes have a sampling method? Because if he doesn't, the only way he is going to have more representable results than scientific studies is by coincidence.

I am also a little sceptical of this Onur, who had made it clear he liked Dodecad better than Behar's before any mention of a supposed correspondence with the scientists. In any case, Cappadocia isn't a peculiar Turkish region in terms of ethnicity or religion or history.

Of course, ethnic groups and countries may be largely coincidental, but not absolutely so. For example, there are Italian speakers who live in Switzerland, Turks who live in Bulgaria, and so on.

There is no single formula to account for such cases. One has to use background knowledge.

For example, someone whose 4 ancestors were German-speaking citizens of Germany may either end up in German_D or Ashkenazi_D depending on the religion and/or self-identity of one's ancestors.

Someone may profess to have 4 Catholic Spanish ancestors from Spain, have a Spanish surname, etc. but his results may be an extreme outlier in the context of other Project participants and/or other non-Project Spanish samples; such cases are viewed with suspicion, and if the submitter cannot explain them adequately, are not included in the relevant (e.g., Spanish_D) population.

Finally, there is the issue of genealogy. People who take these types of tests often have very deep knowledge of their genealogy. So, if some Swedish guy tells me that they have an 18th century Danish ancestor, I will most likely include him in the Swedish_D population. My criterion is to include individuals with a single great-grandparent or less "other" ancestry, as long as it is from a related group. For example, someone that has 4 Swedish grandparents, and one of those grandparents was Swedish+Danish, he will be included in Swedish_D, whereas if he was Swedish+Egyptian, he will not be.

In conclusion, I take all adequate measures to create Dodecad samples that avoid possible pitfalls; I am sure my criteria are not perfect, but no criteria are perfect.

Even the published samples from academic projects contain ambiguous/suspicious cases, and a perusal of the population portraits can be used to detect such cases for either my Project or the reference populations.

I am also a little sceptical of this Onur, who had made it clear he liked Dodecad better than Behar's before any mention of a supposed correspondence with the scientists.

I have never said I "liked" any one of the Turkish populations more than the other. If you meant representativeness, I had suspicion that the Behar Turkish population (which is now also the Turkish population used in the newly published Yunusbayev et al. paper) was from a limited area of Turkey since the publication of the Behar et al. paper, but my suspicion gained strength only after seeing the Dodecad genetic results (especially the MCLUST results) of the Behar Turks and the Dodecad Turks together. Seeing them together gave me the opportunity to make comparisons between them, and my comparisons strengthened my previous suspicion about the Behar Turks. So I wasn't so surprised when I learned from Mr. Metspalu within the last three months that the Behar (consequently Yunusbayev) Turks are exclusively from the historic Cappadocia region of Turkey. I stressed the word exclusively, as Mr. Metspalu also told me that all of the Behar/Yunusbayev Turks are from the autocthonous Turks of the historic Cappadocia region, so, for instance, they do not descend from the emigrants from the Balkans or the Caucasus to that region. So the Behar/Yunusbayev Turks can at most represent Turks from the historic Cappadocia region. I say at most, as Mr. Metspalu doesn't know from which provinces of the historic Cappadocia region they are, he only knows that they are all from the historic Cappadocia region and all autocthonous Turks of that region. So they may be from a single province of the historic Cappadocia region or from multiple provinces of it, only the sample collectors know these details according to Mr. Metspalu.

You can ask these issues to Mr. Metspalu himself (his email address is on the internet), as he is a kind person and doesn't leave questions unanswered.

@Dienekes, your definition of ethnicity is fine, so is your criteria before accepting any data. But you see, in order for Dodecad to be as representative (of average) as scientific studies, you would have to refuse some of the people who are perfectly members of a certain ethnic group. Understandably, you don't have this luxury as you can't have a preliminary study and go to people, it's them that come, in limited numbers, to you.

@Onur, I wouldn't want to bother anyone with so few things to ask. I don't know the context of your discussion. I don't even see much disparity between Dodecad and Behar's, at least nothing that could be attributed to Cappadocia. My opinion on the sampling question of Dodecad, or Eurogenes etc, stands.

@Onur, I wouldn't want to bother anyone with so few things to ask. I don't know the context of your discussion. I don't even see much disparity between Dodecad and Behar's, at least nothing that could be attributed to Cappadocia. My opinion on the sampling question of Dodecad, or Eurogenes etc, stands.

I have been discussing with Mr. Metspalu for months, so there are indeed many things to ask if you know what to ask. And our discussion surely comprised much more than the Behar/Yunusbayev Turks or any other Turks; in fact, they were a small part of our discussion when compared to the totality of our discussion. You know, the world is big and Turks are just a small part of the humanity.

As for the issue of disparity between the Dodecad and Behar/Yunusbayev Turks and how much of it can be attributed to Cappadocian Turks, it is not clear where Cappadocian Turks stand in terms of the overall ethnic Turkish genetic variation (after all, how much do we really know about Cappadocian Turkish genetic variation and overall ethnic Turkish genetic variation?), but one thing is clear: Cappadocian Turks are just a small part of the people now called "Turks" or "ethnic Turks", as ethnic Turks comprise almost* all people with Turkish-speaking Sunni Muslim and/or Alevi/Bektashi heritage of at least since the Ottoman times, who are a quite heterogeneous and large group of people with diverse pasts and sources of origin and who are from a quite large area comprising all of Anatolia and the Armenian Highlands, many parts of the Balkans, Cyprus and northwestern Levant, and a small part of northern Mesopotamia. So Cappadocian Turks cannot represent all of ethnic Turks in genetic or anthropological studies.

Population inclusion criteria of Eurogenes and Dodecad are different from each other. Do you know the population inclusion criteria of Eurogenes? I do. If you aren't satisfied with the population inclusion criteria of both Dodecad and Eurogenes, you can always start your own project on the internet. Nothing is stopping you from undertaking such a project. Blogging is free, and DIY tools can greatly help you in the technical issues.

* I said almost, as Turkish-speaking Muslim Gypsy communities are usually excluded from ethnic Turkishness, no matter how long they have been Turkish-speaking Muslim communities.

So Cappadocian Turks cannot represent all of ethnic Turks in genetic or anthropological studies.

But two dozens of immigrants can? (For the record, some of them are basically my friends. I would say the same thing if I were among them. And I don't see much disparity between them and Behar's sample to begin with.)

Population inclusion criteria of Eurogenes and Dodecad are different from each other. Do you know the population inclusion criteria of Eurogenes? I do.

IIRC, it's the same. I couldn't find Polako's mission statement in a brief search, but here is something recent:

[Polako]: I'm particularly interested in people with all four grandparents from the following places... Unfortunately, I can no longer accept any mixed individuals

---

If you aren't satisfied with the population inclusion criteria of both Dodecad and Eurogenes, you can always start your own project on the internet. Nothing is stopping you from undertaking such a project. Blogging is free, and DIY tools can greatly help you in the technical issues.

The point is, ideally these things aren't done in the Internet, but in a field research by a team of academicians and technicians. Don't get me wrong, I'm glad Dodecad and Eurogenes exist, both as standalone projects and tools to be worked on the samples from scientific studies. I just don't think the representativeness of their samples is their strong point, that's all. AFAIK, not even Dienekes or Polako claimed this, only you.

How much do you know their places of origin? Only Dienekes know the places of origin of all of them. So please be serious. I am not a child you can fool.

IIRC, it's the same. I couldn't find Polako's mission statement in a brief search, but here is something recent:

[Polako]: I'm particularly interested in people with all four grandparents from the following places... Unfortunately, I can no longer accept any mixed individuals

They aren't the same. Recently David (Polako) revealed to me via email his population inclusion criteria:

- all four grandparents must be from the same ethnic group [emphasis mine]

- no unusual behavior on MDS plots

I specifically asked David whether he would include Basar (b/o), whose two grandparents are ethnic Turks and the other two grandparents are Laz from Turkey, in his Eurogenes Turkish population, and he said that he wouldn't.

AFAIK, not even Dienekes or Polako claimed this, only you.

I just said that the Dodecad Turks are much more representative of the overall ethnic Turkish genetic variation than the Behar/Yunusbayev Turks because that the former come from all over Turkey while the latter come from a small part of Turkey. I don't think Dienekes has any objection to this claim. The Eurogenes Turks too are much more representative of the overall ethnic Turkish genetic variation than the Behar/Yunusbayev Turks because of the same reason.

Regarding the representativeness of the Behar/Yunusbayev Turks, Mr. Metspalu wrote to me these: "Of course samples from one region of a big country are not well representative of all of the country. I have approached Dienekes to see if and how we can use the data he is using. We'll see how it goes." [emphases mine]

So Mr. Metspalu seems to acknowledge that the Dodecad Turks are much more representative of the overall ethnic Turkish genetic variation than the Behar/Yunusbayev Turks.

Regarding the representativeness of the Behar/Yunusbayev Turks, Mr. Metspalu wrote to me these: "Of course samples from one region of a big country are not well representative of all of the country. I have approached Dienekes to see if and how we can use the data he is using. We'll see how it goes." [emphases mine]

I've also explained to Onur that any group of Turks can use my DIYDodecad tools to come up with any sort of average for "ethnic Turks" using whatever inclusion criteria they see fit. So, people who think that this or that individual should/should not be included, or who have Turkish data and don't want to submit to the Project, can still use my tools -for free- to come up with any alternative average they want.

The # of SNPs is sufficient. 100,000 SNPs or so are sufficient for fine-scale population structure using ADMIXTURE, and 150-200,000 are usually enough to remove most "noise" from the calculations. Depending on the threshold one uses for linkage-disequilibrium based pruning (say R-squared between 0.1 and 0.5), one ends up with about 110-200k SNPs.

Useful software

You may cite, quote, or reproduce articles on this site for non-commercial purposes, provided that you attribute them to Dienekes Pontikos and provide a link either to the main page of this blog or to the individual blog entry you are referring to.