Tuesday, July 19, 2011

The Dodecad Oracle v1

Here is a little fun tool that tests the Dodecad v3 admixture proportions of an individual against all the reference populations, but also against the best pairwise combinations of these populations.

You need to install R to use it, and then download the program and double click on the file DodecadOracleV1.RData that can be found within the rar file. You will then be faced with a command prompt where you can enter the following commands:

Examining which populations are available

Just enter

X[,1]

You will see a list of 227 populations. You can use these population IDs in the next section.

Once again, you can specify k=30, if you desire the 30 top matching populations instead of the default 10.

Mixed Mode

You use mixed mode by adding mixedmode=T in any of the commands. The program then considers all pairs of populations, and for each one of them calculates the minimum distance to the sample in consideration, and the admixture proportions that produce it; population pairs where the distance to one of the two populations is smaller than to any admixture of the two are ignored.

The mixed mode should be used with caution, and it shows, more than anything else, how similar apparent "mixes" can be achieved by different combinations of ancestry. Nonetheless, it may prove somewhat useful. For example, there is a suggestion in the above results, that Pathans can be viewed as a mix of other South Asian populations and populations from the eastern Caucasus, a suggestion that was arrived at independently by the Project using different methods.

Here is another example:

DodecadOracle("Assyrian_D",mixedmode=T)

[,1] [,2]

[1,] "Assyrian_D" "0"

[2,] "83.9% Armenians_16 + 16.1% Yemen_Jews" "1.7829"

[3,] "89.1% Armenian_D + 10.9% Saudis" "2.1624"

[4,] "84.3% Armenians_16 + 15.7% Saudis" "2.2884"

[5,] "88.9% Armenian_D + 11.1% Yemen_Jews" "2.2983"

[6,] "83.8% Armenian_D + 16.2% Bedouin" "4.1579"

[7,] "72.2% Armenian_D + 27.8% Syrians" "4.1841"

[8,] "23.4% Georgians + 76.6% Iraq_Jews" "4.2418"

[9,] "76.2% Armenians_16 + 23.8% Bedouin" "4.332"

[10,] "61.5% Armenians_16 + 38.5% Syrians" "4.4019"

This reaffirms the close relationship of Assyrians to Armenians that has been noticed in the project and by others, and it also shows that Assyrians differ from Armenians in a Southwestern Asian direction, consistent with their Semitic language.

Or, African Americans:

DodecadOracle("ASW",mixedmode=T)

[,1] [,2]

[1,] "ASW" "0"

[2,] "81.3% Hausa + 18.7% N._European" "2.3891"

[3,] "18.4% Orkney_1KG + 81.6% Hausa" "2.4031"

[4,] "18.5% Argyll_1KG + 81.5% Hausa" "2.4268"

[5,] "18.4% Orcadian + 81.6% Hausa" "2.4657"

[6,] "80.5% Igbo + 19.5% N._European" "2.5031"

[7,] "80.6% Brong + 19.4% N._European" "2.523"

[8,] "18.6% CEU + 81.4% Hausa" "2.5938"

[9,] "19.1% Argyll_1KG + 80.9% Brong" "2.6197"

[10,] "19% Orkney_1KG + 81% Brong" "2.6274"

I don't know that much about the slave trade, but I believe that Ghana was an important part of it?

Another thing to watch, is that some populations tend to have more than one sample available, so they appear to be mixtures of themselves, which is not really very informative, e.g., Spanish_D

DodecadOracle("Spanish_D",mixedmode=T)

[,1] [,2]

[1,] "Spanish_D" "0"

[2,] "7.9% French_Basque + 92.1% IBS" "0.8713"

[3,] "68.9% IBS + 31.1% Spaniards" "1.0377"

[4,] "98.8% IBS + 1.2% Irish_D" "1.2959"

[5,] "1.2% British_Isles_D + 98.8% IBS" "1.3018"

[6,] "1.2% British_D + 98.8% IBS" "1.3019"

[7,] "99% IBS + 1% Norwegian_D" "1.3046"

[8,] "1.2% Cornwall_1KG + 98.8% IBS" "1.3048"

[9,] "98.8% IBS + 1.2% Kent_1KG" "1.3142"

[10,] "2.2% French_D + 97.8% IBS" "1.3179"

To deal with these problems, you must "edit" the X matrix if you want to exclude some populations. For example, if you want to exclude "Spaniards" and "IBS", you must enter:

X <- X[setdiff(1:227,which(X[,1]=="IBS" | X[,1]=="Spaniards")),]

but notice, that you must relaunch the program, if you want to get the original matrix, or alternatively save it like this:

Doug McDonald is now giving people three-way admixtures, though from perhaps a more compact set of population references. Might you alter the program to maximize the likelihood on 3-way or N-way admixtures by adjusting a parameter?

Might you alter the program to maximize the likelihood on 3-way or N-way admixtures by adjusting a parameter?

I'm not sure what Dr. McDonald does; this program uses a "geometric" approach to find the optimum admixture ratio for pairs of populations, and it ignores the population pairs where admixture isn't as good as one of the two populations.

The approach used can be extended to 3 populations, although the geometry becomes more complex, and I will have to doublecheck it before I release a new version. Interestingly, in my experiments so far it seems that many individuals are better described as 2-way mixes than as 3-way mixes, although I've seen a few (such as a Mexican) that is best described as a 3-way mix.

I am having a problem typing commands and getting good results. If I copy the commands from the post above it works fine, but if I type the same commands in the R Console it doesn't work I just get a "+" and a blinking cursor. What am I doing wrong?

Well, this is ... interesting. From what research I've been able to do in the past year, all I can say for certain is I'm very American as I have both paternal and maternal ancestors that trace back to revolutionary times.

My paternal grandfather is believed to have been mostly Irish.My paternal grandmother's father was Azorean and German.My maternal grandfather seems to be mostly a mix of English and either Scottish or Irish, and maybe Native American.My maternal grandmother is a complete mystery. My maternal haplotype is H4a1a which appears most commonly in Polish/Irish.

I’ve only been reading your blog for a short time and I’m still trying to make sense of a lot of it. If I’ve done everything correctly, it seems to me the Dodecad Oracle results lean, rather unexpectedly, in a French direction rather than Irish. I have yet to encounter any French ancestors in my searches.

Teresa, some of your Irish, Scottish and English ancestors probably were Anglo-Norman (of Norman French origin). All the 'Fitz' names (Fitzwilliams, Fitzgerald, Fitzhugh, etc.) are of Norman French origin. There were also many nobles who helped William the Conqueror take England in 1066 from other areas of France who were granted lands in England. Later when the Scottish lowlands and Ireland were conquered the same thing happened. And shortly after England was conquered the Angevins came to dominate the Kingship; they were French from Anjou, not Norman. The Norman French/French nobles often intermarried with Anglo-Saxon, Scottish and Irish nobles who were conquered--this legitimized their presence to the populace, especially as the next generation would be mixed--and they also continued to intermarry with French nobles for political alliances. There was a ton of interaction across the English Channel during the Medieval period. The French nobles had also intermarried with German, Dutch, Flemish, Spanish etc. nobles for hundreds of years; the whole European landscape kept shifting over the centuries with different cultural groups dominating, conquering, mixing with those they conquered or retreating and leaving some behind, just as some Romans and people of mixed British and Roman blood stayed in England after the fall of the Roman Empire. Under the Roman Empire a lot of intermarriage occurred between nobles of different European regions as well. The best site to follow your lines back is www.geni.com where they're building the world family tree.

So if you have noble Norman French & French blood through England, Scotland and/or Ireland you're likely to find a huge mix that may even go back to Spain, Rome, Egypt and the Middle East as well as Scandinavia, the Ukraine and Eastern Europe.

When you run Dodecad Oracle on the admixture proportions of an individual, what do the numbers mean that follow each population in the resulting ranking of populations? In the results that others have posted here, the numbers usually start out quite low. For me, even the top ranked option seems to have a relatively high number following it (the top results is 12.8833). Do these numbers represent some sort of measure of how well each population fits the individual's admixture proportions?

I am curious about this because my mother is from Brazil and my father is from the United States. I likely have very admixed ancestry, and was wondering if the higher relative numbers indicate that none of the populations are a very good match for me.

Yes, the lower the number, the closer the fit. You can compare your results to the top few populations or to the weighted averages of pairs of populations to see in which components you are different from them.

Individuals with more than 2 ancestries or people from very variable groups can usually expect not to see a tight fit.

Can you give some information about the "Slovenian" population? I thought my parents were the only 2 Slovenians in the Dodecad project, but the "Slovenian" population comes second for both of them in Oracle, while "Hungarians" is first. I thought my parents were typical Slovenes, and they even come from 2 regions that were politically separated for centuries and part of 2 different countries. Interestingly, they are extremely similar genetically, and much closer to Hungarians than to other Slovenians. But it may be possible that the "Slovenian" population in Oracle is small or not representative of the actual Slovenian population. I would be curious to know more.

The Slovenian population's percentages are based on a smaller number of markers, so potentially they are not as accurate as the Hungarians. The number of SNPs is listed in the Dodecad spreadsheet.

Another possibility is that the distribution of Slovenians and Hungarians overlap so that particular Slovenians may be closer to the Hungarian average and vice versa.

My question is: could such a good fit be caused by just one or two (thusfar unknown) ancestors from the West of Scotland of (well) before 1800?

A couple of ancestors before 1800 would not determine your average. Also, the Argyll sample is small, and, moreover, we should all bear in mind that not all populations are distinguishable from each other on either the basis of this analysis, or in general. If you run Oracle with 'Argyll_1KG" you will see that there are several populations quite close to it.

LS, see my response to Teresa above. I have ancestors that came to the Massachusetts Bay Colony in the 1620s and '30s and some of them trace back to English nobility & royalty who were a HUGE mix tracing back all over Europe (including Eastern Europe & Scandinavia) and to the Middle East. This is because for 1500 years the nobles & royals kept marrying those from other countries for political alliances. There were also a lot of conversions over the centuries from Judaism to Christianity and vice-versa for various reasons, especially in Germany and probably in Switzerland as well.

On my father's side I'm English/Irish and on my mother's Norwegian/English so these results appear to make sense although I'm not too sure about the Mediterranean or Asian parts.If anyone has additional insight as to what these mean (or if I've even done it right!), I'd appreciate it.

My results would appear to not mean much since my grandparents are from very different places. I am a little surprised that the "asian" component of my result didn't pull me further east. I estimate I am 10-15% Native American

I wish the instructions were easier to understand. :/ I feel like a chimpanzee trying to interpret a legal document sometimes with this Dodecad stuff. I've got R running, managed to get Oracle and K12 in the folder (or working directory). What do I type in to see what percent of the samples I am? Hopefully my question made sense. Also when it says "DodecadOracle(c(4.6, 16.7, 33.6, 0, 23.2, 0.4, 0.6, 1.6, 0.7, 14.1, 4.5, 0.2))" Where are all those numbers coming from? I hate to seem daft, but I really don't understand this stuff. I try to read it, and the readme text files, but my brain seems to refuse to grasp it. (Oh, the joys of ADD -___-)Also am I correct in understanding that in the example below that I am more closely related to the Kent_1KG sample than the Orkney_1KG because the number is lower? [,1] [,2] [1,] "British_D" "0" [2,] "Kent_1KG" "0.4" [3,] "Cornwall_1KG" "1.631" [4,] "British_Isles_D" "1.7117" [5,] "CEU25" "2.6665" [6,] "Irish_D" "2.7331" [7,] "Dutch_D" "3.1016" [8,] "Argyll_1KG" "3.6579" [9,] "Orcadian" "3.755" [10,] "Orkney_1KG" "4.279"

What do I type in to see what percent of the samples I am? Hopefully my question made sense.

First you need to get your admixture percentages. To do that you must first run standardize (see README) and then you must run DIYDodecadWin:

http://dodecad.blogspot.com/2011/09/do-it-yourself-dodecad-v-21.html

Also when it says "DodecadOracle(c(4.6, 16.7, 33.6, 0, 23.2, 0.4, 0.6, 1.6, 0.7, 14.1, 4.5, 0.2))" Where are all those numbers coming from?

These numbers will be reported by DIYDodecadWin when you run it on your sample (see above).

Once you get these numbers, you input them in the same order in DodecadOracle as above. Note, however, that this post is about Dodecad Oracle v1 which works with the "calculator" dv3 that is bundled with the DIYDodecad program. If you use the more recent K12b calculator (http://dodecad.blogspot.com/2012/01/k12b-and-k7b-calculators.html), you must use the Oracle designed for that, and which can be downloaded from that page (http://dodecad.blogspot.com/2012/01/k12b-and-k7b-calculators.html). All the most recent calculators/Oracles of the Project will always be available from the bottom right of the blog under "Project Links".

Also am I correct in understanding that in the example below that I am more closely related to the Kent_1KG sample than the Orkney_1KG because the number is lower?

I have been reading the posts for a few days now and trying to make sense of my results. I cannot find instructions on how to read my basic results. I apologize for my genealogy ignorance. Dienekes, you must be a genius!

Anyhow, I will use Africa 9 Oracle. I understand Admix Results. I sort of understand Single Population sharing-that the smaller distance means more of a match? But....Mixed Mode Population Sharing- Which percentage do I use since they are all so close? Also, am I supposed to compare the distance with the Secondary Population and the closest is the probable match? Such as 64.2% Morocco_Jews to 35.8% French_Basque with distance of 1.67. Or do I consider the Tuscan with 93.2% to Mozabite 6.5%. Or since I'm showing large percentage matches with mostly North_Italian do I go with that?

I'm sorry but I am truly completely green at this. Thanks for your patience.MaiysaPS I am a member of a Native American Tribe with a lot of Scandinavian and French, so the Italian is really confusing me, but very exciting, since I'm in love with Italy!! But not sure if it means anything.

Discounting # 1 and #3 (I have no Slovenian / Hungarian / Moroccan Jewish ancestry at such levels) If I understand what this means is that compared to the AJs who provided samples I have a higher amount of North African admixture indicating a small North African component together with my Ashkenazi Jewish ancestry?

Hello,my Dodecad V3 results don't show any South-West Asian or South Asian component, but using your spreadsheet for V3, I see that the most ethnicities have it, even Selkup has a value of 0,1.

Dodecad World9 has least distances at mixed-mode and it shows 90% Russian, which I am for the most, but I've heard that this test is created only to discover Nartive-American ancestry. I got results like 86.8% Russian (Dodecad) + 13.2% TSI30 (Metspalu) @ 0.9687.1% Russian (Dodecad) + 12.9% Tuscan (HGDP) @ 1

I am so lost. I can't make sense of what these results are showing. I'm 67 and smart, but certainly challenged by the lack of plain English interpretation and explanations here. I feel as though the answers to my lifelong genealogy questions hang behind a curtain of confusion. I show a higher than what I would expect percentage of middle eastern, french and german in my results. I have no known people of those ethnic populations back to ggg grandparents, so ... was there infidelity, adoption, something else to explain this. I am overwhelmed by trying to understand the computer-language answers here. I have great appreciation for the person who has taken all the time and effort to develop this program, but please ... hire someone who can make reading the results easy.... I figure I only have 20 years left to find the answers to this lifelong passionate journey. And it won't get any easier from here. Known ethnicities: Father's father: Northern Italian. Father's mother: From Italy, but varied as she came from a family who probably married the women off to husbands from other countries for political and financial reasons. her father was reputed to have some Spanish (also Jewish?) as his name was Catalano. (Catalan origin?)Mother's father: Portuguese/old New England. Mother's mother: Scots/Irish/English and maybe German. My ID number is: M632230 if anyone would like to have a go at interpretation. A better email for me is: mamasi@comcast.net Thank you for your help and understanding. Melody

Hi, has anyone answered you yet? Since you're new to this, are you aware that Italians, Jews, Spanish, and Portuguese often have some ties to the Middle East? If you're wondering why you have some Middle Eastern, Spaniards and Portuguese had some Moroccan Arab DNA moved up into them, they also had some Sephardic Jew DNA sometimes, the Italians often have Levantine (Jews and Arabs of the areas in the Levant known as Israel, Jordan, Lebanon, etc) moved into Italy at points in history. These are some basic examples, Deinekes could explain much better about the facts behind admixture and history than I can. Humans are often surprised to see what other racial backgrounds are in them because they don't often know the facts about where their ancestors were long ago. It's quite easy to be a "White" person and still have some ancient DNA links to Arabs, Jews, South Asians, etc. It's more common than you would think because just being from Europe doesn't mean you're 100% "White", you can have "Off-white" and "brown" in you too. Skin color is a poor way of determining race and Whites think they're White but they're really unable to know what was in them 500 years ago. One time a Mexican girl on YouTube said "My DNA test says I'm part Spanish, I'm not though, I'm from Mexico, why would there be Spanish in me?" She thought there was a race called "Mexican", she didn't know that Spaniards often moved around in Mexico mixing with Native Americans. The people in the comments section ripped her up a bit but I think it's better to teach someone without being rude.

When I enter the following to the Gedmatch program:Admixture (heritage)EurogenesAdmixture Proportions (With link to Oracle)My kit numberEurogenes K13My ethnicity: GermanOracleI see:Single Population SharingPopulation DistanceSouth_Dutch 2.48West_German 2.57French 6.74Southeast_English 7.16 and so forth.This seems fine since my ancestors, back to the late 1500s, are from Germany.But the next section for 'Mixed Population Sharing' gives:97.7% South_Dutch 2.3% Sakilli97.7% South_Dutch 2.3% Chamar97.7% South_Dutch 2.4% North KannadiI don't understand why tribes from India have a connection to the Dutch.Is this an error in the program?

No, because you have the Indo-Aryan migration which was people migrating ago from India to Western Europe. Also, Dutch and German language are considered Indo-European. It is common for Germans to have Indian results due to the Indo-European contact from Eurasia.

Excuse my ignorance but I'm trying to understand my results below. My father's family has lived in American for almost 300 years now. There is some dispute whether his direct paternal ancestors emigrated to America from one of the German states or from the British Isles. My dad's mother came from Scottish/Irish emigrants to America. My mother was born and raised in the Birmingham, England area. Very little is known about her father's family as he was an illegitimate child and I do not know who his biological father. His mother's family was also from Birmingham going back several generations. Any help you can give regarding my results below would be greatly appreciated.

I am new to this and I'm having difficulty interpreting my results below from MDLPK23b.

My father's paternal ancestors with the surname as mine have lived in America for 300 years. It is a matter of some dispute whether they emigrated from one of the German states or from Great Britain. Do the following results shed any light on the answer to that dispute? My father's mother has been in America for almost as long believed to have emigrated here from Scotland or Ireland.

My mother was born and raised in Birmingham, England from English parents. Her father's paternity is unknown as the identity of his biological father is unknown. His mother's family is thought to have also been from the West Midlands of England.

I'm assuming the Scandinavian is coming from Viking invasions of Britain but I don't know.

Ancestry.com DNA analysis found I was 15% Iberian Peninsula but I see none of that here.

The polynesian/pygmy/east african are completely unexpected and baffling. Are these of such small amounts that they may be a single ancestor thousands of years ago?

Useful software

You may cite, quote, or reproduce articles on this site for non-commercial purposes, provided that you attribute them to Dienekes Pontikos and provide a link either to the main page of this blog or to the individual blog entry you are referring to.