October 03, 2010

PopAffiliator: estimating origin with forensic autosomal STRs

Anders Pålsen alerts me to PopAffiliator, a neat little tool that guesses the origin of a sample from one of several major population groups:

The STR collection database used to train and evaluate the machine learning model encompasses data gathered from more than 40 different studies and contains a total of 56,222 individuals, distributed by 7 major geographical locations: East Asia, Eurasia, sub-Saharan Africa, North Africa, Near East, Central-South America and North America. The data is available here.

The tool uses 17 forensic autosomal STR markers. This may seem like too little in this day and age, but it is sufficient for the purpose at hand.

A few years ago someone had posted at the dna-forums site -I can't seem to find the topic today- a STRUCTURE-based calculator on mostly the same markers. That calculator contained 500 German/500 Chinese/500 African individuals:

Group 3=Africans

Group 2=Chinese

Group 1=Germans

Back then, I ran STRUCTURE on the data yielding the following results:

If we look at the 1,500 individuals, it turns out that correct "guess" of a person's origin (i.e., his maximum inferred cluster membership coefficient corresponding to the real one) occurred in 497/500 Africans, 491/500 Chinese, and 490/500 Germans.

Pretty good! Typing hundreds of thousands of SNPs to guess if someone is East Asian, European, or Sub-Saharan African is overkill, and there is already widespread forensic profiling of numerous human populations, so why not amortize all this data?

The problem with using so few numbers isn't that they are insufficient to guess one's origin, but that they are insufficient to estimate admixture, if present. Here is the triangle plot from my aforementioned mini-experiment:

If hundreds of thousands of SNPs had been used, the red, green, and blue dots would be gathered in tight clusters in the three corners, with the occasional individual deviating in a different direction. Nonetheless it's also clear that very few individuals deviate beyond the 50% cutoff from their true origin, which might fool one into assigning them to the wrong population.

Since I have some autosomal test data of my own, I decided to give PopAffiliator a try.

The link to PopAffiliator will go to the right sidebar of the blog.

INTERNATIONAL JOURNAL OF LEGAL MEDICINEDOI: 10.1007/s00414-010-0472-2

PopAffiliator: online calculator for individual affiliation to a major population group based on 17 autosomal short tandem repeat genotype profile

Luísa Pereira et al.

Because of their sensitivity and high level of discrimination, short tandem repeat (STR) maker systems are currently the method of choice in routine forensic casework and data banking, usually in multiplexes up to 15–17 loci. Constraints related to sample amount and quality, frequently encountered in forensic casework, will not allow to change this picture in the near future, notwithstanding the technological developments. In this study, we present a free online calculator named PopAffiliator (http://cracs.fc.up.pt/popaffiliator) for individual population affiliation in the three main population groups, Eurasian, East Asian and sub-Saharan African, based on genotype profiles for the common set of STRs used in forensics. This calculator performs affiliation based on a model constructed using machine learning techniques. The model was constructed using a data set of approximately fifteen thousand individuals collected for this work. The accuracy of individual population affiliation is approximately 86%, showing that the common set of STRs routinely used in forensics provide a considerable amount of information for population assignment, in addition to being excellent for individual identification.

25 comments:

I was a bit surprised as -being Greek from both sides of the Aegean- I expected to register at least some "Near Eastern" probability, but the populations in the Near Eastern group included Arabs and Iranians; these are probably poor representatives of any West Asian component in my genome.

There is no "Near Eastern probability", read the abstract and the site information carefully. "Near Eastern" is just one of the population categories in the database, but the results don't have a "Near Eastern" category but only the three categories appearing in your result, namely: 1- "Eurasian" 2- "East Asian" and 3- "sub-Saharan African".

You're welcome. Being an Anatolian/Balkan Greek mix, you must surely have much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much much ... more Near Eastern DNA than any East Asian or Sub-Saharan DNA even if there is any DNA in you from these two far away (to the Balkans and Asia Minor) regions. Wish they also tested for Near Easternness.

Hey Dienekes, wait until 23andme has another outreach sale (full package, $99). It will be great fun to see what your Ancestry Painting result is. I'll bet that your Mongoloid component is enough to make you wonder if you have any genes from the Karamanli or Yürüks. While you haven't said expressly which parts of the Ottoman Empire you derive, your comments make me think Cappadocia and Arcadia.I've discovered likely Venetian and Vlach ancestry no one in my family knew about. It's fun! What result do you predict for your own EuroDNACalc result? I've seen 100% SE in peninsular Greeks, but not in any other Balkan person. I wish some Sicilians would post their results in the 23andme fora (another cool feature).

I don't consider 23andMe the acme of genetic profiling. The fora is as dumb as you can get. Americans!

Why the interest in Sicilians? I noted a couple of Sicilian Italians, and Southern Italians, both 100% European. My results, 100% European. I am Maltese, and that island is further south than Sicily. A Greek from Crete was 100% European, yet the Norwegians and some American Whites? have Asian admixture. Pocahontas? The Cherokee Princess?

While you haven't said expressly which parts of the Ottoman Empire you derive, your comments make me think Cappadocia and Arcadia.

Didn't Dieneke write earlier in his blog that his paternal side is completely from Pontus (Northeast Asia Minor)? As to his maternal side, it is completely from the Balkans but I don't know from which part.

As to the non-Caucasoid admixture, he needs to try much more detailed tests like the 23andMe ancestry test instead of very simple and rough ones like PopAffiliator to learn whether he really has non-Caucasoid admixture or not and how much he has.

"The fora is as dumb as you can get. Americans!" saith Ponto. In the fora, at least those Americans deploy complete subject-verb agreement. :)

Ponto, my interest in Sicilians has to do with their subracial breakdown, not their macroracial assignment. A recent paper had 37% of the male-line Sicilian ancestry as Greek. DNATribes and the defunct EurasianDNA both split off what I would call a Byzantine Cluster that stretches from Southern Italy through Asia Minor. Naturally, I expect that EuroDNACalc's southeastern readings to peak in Greece.

Onur, thanks for your note about the Karamanlides. Turkification of Greek Christians is indeed a more likely explanation for their origin than resettled Turkish converts, but as long as we're speculating, why not source Asianness to a variety of possible sources? By the way, I don't see the Pontus as so out of the way. You could get Ukrainian or Laz genes that way!As to Dienekes' regions and surname, I always took those to be noms de plume. But if they're chosen to identify his local ancestry, then he's half Spartan!

Wow, I didn't know that Greek colonies survived that long outside of modern Greece and the aegean. I thought it may possibly be the case in Cyprus and parts of S. Italy also. I know Greek colonies got absorbed most elsewhere. I just looked up Pontus and man its pretty.

I just checked out 23and me and I feel quite iffy about it. How much of their research is based on consumer initiated testing and self descriptions given online? And what's up with them using what seems to be 3 races and other intermediates... it just looks sloppy to group NA, Chinese and Australians. That's the other problem I couldn't get much detail out what descriptions you get back.

How useful would it be for me? I should be nearly all western European and from multiple nations. I guess that's good because I'm not really from intermediate areas (makings the admixture more meaningful?) and I might be varied enough to make it interesting. I really want a result showing some "mixed" Westerns European heritage. Some result like I'm likely part Scottish English and swiss (or I suppose British isle and central European). Or would it just average those into say Belgian? Also I have a relative saying they are exactly 5 generations away from an NA ancestor. Can 23andme test that claim reliably like they say?

Lastly how does 23andme compare to dna tribes? I have a better idea of how they work.

Dieneke, I forgot to mention Leucosyri as one of the ancestral peoples (maybe the most important one) to Pontian Greeks.

Correction for "Tsan": Tzan

Spy:why not source Asianness to a variety of possible sources?

Spy, what "Asianness" are you talking about? If you are referring to the PopAffiliator result of Dieneke, as I and Dieneke have pointed out above, PopAffiliator is a very poor tool for detecting admixture, so his result doesn't tell anything clear about Mongoloid or Negroid admixture, thus he can very well be actually 100% Caucasoid. 23andMe, on the other hand, using hundreds of thousands of SNPs instead of the only 17 (yes, only 17!) STRs of PopAffiliator, is far better in detecting the real or close to real admixture values.

Sorry, Dieneke, as I said, I made a serious mistake while reading your post, so I had to repost.

So I will post my previous comment in its last form:

Dieneke, I said maybe, didn't make a definite judgment. Besides, you didn't publish my previous comment, which was mentioning peoples ancestral to Pontian Greeks other than Leucosyri. So I will reframe my previous comment in its updated form:

Spy:By the way, I don't see the Pontus as so out of the way. You could get Ukrainian or Laz genes that way!

Pontian Greeks are essentially Hellenized/Romanized (in Ancient and Byzantine times) Leucosyri, Tzans, Laz, Georgians and other indigenous peoples of the Pontus area with a minor contribution of Greek colonizers (Ancient and Byzantine-era) mostly in the coastal areas.

Spy:I've discovered likely Venetian and Vlach ancestry no one in my family knew about. It's fun!

When people try to understand their ancestry going back some hundreds to thousands of years or the ancestry of certain nationalities like Italians or Greeks or Spaniards, they should leave the history books in the library and just go with the data that comes from the testing of dna, and combine evidence found from archaeological digs. Someone said history is bunk, at least most of it is distorted and speculative. With the present day Jews, the Bible is used out of its religious context as some sort of reference. No one has proven the Jews have any connection to any Biblical folk or that they come from Levantine region. DNA shows present Jews to be essentially Southeast Europeans, hard to distinguish from Sicilian Italians, Southern peninsula Italians, Anatolians and others in the eastern basin of the Mediterranean Sea. They could be from the Levant or Southern Italy or Anatolia or the Caucasus region. All my SNP results show is that I come from where I was born, it is a GPS indicator and very accurate. The results haven't told me whom my ancestors were ethnically or racially other than West Eurasians.

The point is stop speculating about the origins of certain people unless you have proof. Don't just quote the Bible or use schoolbook history about Romans or Phoenicians to try to proof your points.

It is interesting that FTDNA have admitted in their information about their Population Finder that they cannot distinguish with their small battery of SNPs, Jews from Southern Italians from Anatolians from Greeks from Levantines. Their PF is not very sophisticated. This is from the firm that employs that Behar man who did a recent study on Jews! Makes his recent study utter rubbish.

Nuadha asks if 23andme locates one's ancestry down to the level of admixture of ethnies. No. Ancestry Painting calculates macroracial admixture and show you on which chromosomes such components lie, whereas Global Similarity—Advanced reduces your allele frequencies to two composite variables correlating with west-east and north-south migration, and then places you on a graph with those two axes. So, you should find yourself clustering with a geographically intermediate country, usually.

Your buddy could be mistaken about his putative NA ancestry. If his genome shows NO Asian admixture, then he surely has no such ancestor a mere five generations back.

You can also run freeware on your Raw Data File for ancestry (think EuroDNACalc) and health-related applications.

Dienekes, I agree Sparta is not out of the way. Other than Arcadia, West Thessaly sounds like a good candidate for a non-coastal redoubt west of the Aegean.

Onur, one of the 23andme features is calleded 'Ancestry Finder'. It allows you to see which customers have 'half-identical runs' with you. So, for example, on the X Chromosome I have at least 5cM HIRs with a Romanian. I have at least five other Romanians appearing on my autosomes who must also be distant cousins. Conclusion? Vlachs on my mothers side.

I also wrote a Greek who shares my mother's maiden name. He informed me that he had traced it to Souli by way of Venezia. That plausibility was enhanced by at least two historical events, etymology, and by my having two distant Brazilian cousins in Ancestry Finder whose—luckily for me—public profiles listed as their common mitochondrial ancestor a Venetian lady (d. 1840). (Otherwise, they were Portuguese and Amerindian)

About Asianness: In my experience, when you compare classifiers against admixture calculators, the admixture percentage for minor components is greater than the classifier likelihood, which makes sense. I had a friend (and distant relative according to 23andme!) mistake EuroDNACalc1.0 for 1.1 and tell me that he was "98.7% Northwestern". Wait a minute! You only get whole numbers from that. EDC1.0 is saying that IF he had a single origin THEN it's 98.7% likely to be NW. He downloaded and ran EDC1.1 and obtained a Maximum Likelihood Admixture of 85% NW/15% SE.

As for my own ethnicity and regions, I'm an all-Greek American with ancestry from areas fully redeemed in 1832.

In my experience, when you compare classifiers against admixture calculators, the admixture percentage for minor components is greater than the classifier likelihood, which makes sense. I had a friend (and distant relative according to 23andme!) mistake EuroDNACalc1.0 for 1.1 and tell me that he was "98.7% Northwestern". Wait a minute! You only get whole numbers from that. EDC1.0 is saying that IF he had a single origin THEN it's 98.7% likely to be NW. He downloaded and ran EDC1.1 and obtained a Maximum Likelihood Admixture of 85% NW/15% SE.

EuroDNACalcuses hundreds of markers, but PopAffiliator uses only 17! Such a small number of markers, no matter how carefully chosen, should always be viewed with suspicion. Maybe one day Dieneke decides to take a much more detailed test like 23andMe and removes most of the doubts.

One other comment regarding the current population of Sarakatsans is that I'm pretty sure their current official number is low simply because Sarakatsans gradually abandoned their traditional way of life. Shepherding hasn't been profitable as a business for several generations. Many of the areas of traditional shepherding have also experienced significant outmigration of people.

So the current population of Sarakatsans is much lower than their traditional numbers of even two or three generations ago.

Vlachs, on the other hand, may have hung onto their ethnic designation in greater number by way of the fact that they speak a different language.

i visited this site due to some curiosity regarding DNA markers in ancient cultures , namely neolithic tribes that may or may not be the Sarakatsani and the Vlachs. Instead I am amazed by some of the reasoning regarding ethnicity vs racial admixture and linguistic or cultural affinity. Has no one here read a book? None of these are related to each other , there are tribes in China that show Greek DNA , there are populations of Scandinavians with Mongoloid admixtures and Sub- Saharan Negroids with Caucasoid markers. Humans get around and have been for thousands of years. If we trace DNA all the way back to the first humans they were of African , Negroid physiognomy and DNA ! Why all this fuss over what brass ring? I don't get it. Its one thing to be obsessed with racial origin and quite another to be enamored with cultural background. All in all a vapor and nothing more than mass insanity.

Old Blog Archive

Dienekes' Anthropology blog is dedicated to human population genetics, physical anthropology, archaeology, and history.

You are free to reuse any of the materials of this blog for non-commercial purposes, as long as you attribute them to Dienekes Pontikos and provide a link to either the individual blog entry or to Dienekes Anthropology Blog.

Feel free to send e-mail to Dienekes Pontikos, or follow @dienekesp on Twitter.