Saturday, January 21, 2012

Please refer to the previous analysis on the Balkans/West Asia for more information about the interpretation of this type of analysis.

I am very pleased with the way this analysis of Afroasiatic groups has turned out, revealing an exceptional degree of resolution. I invite individuals from the Near East and Africa who are eligible, to submit their data, so that they can be included in future runs of this kind.

I have also added the full IBD sharing matrix which lists how many Morgans of sequence are estimated to be IBD with probability greater than 10^-6 between all pairs of individuals.

You can google any non-Project sample IDs to get some more information about their origin. For example, GSM536710 is an Iraqi Jew who shares about half his genome with GSM536714, also an Iraqi Jew. These two samples are almost certainly first-degree relatives. Or, GSM537032, a Samaritan shares 740-1,480cM with the other 2 Samaritans, an exceptional amount in this small and probably highly inbred population.

You can manipulate this matrix in R. After you download it and unzip it, you can load it into R as follows:

X<-read.table('afroasiatic_ibd_sharing.txt',row.names=1,header=T)

Then, you can, for example, sort the IBD sharing for a particular individual, as follows:

"Or, GSM537032, a Samaritan shares 740-1,480cM with the other 2 Samaritans, an exceptional amount in this small and probably highly inbred population."

Probably?

Benyamim Tsedaka who is a Samaritan says that "first cousin marriage is the most noble kind of marriage" (per the Torah) and that aside from his Ashkenazi maternal grandmother, he comes from a long line of first cousin marriages ...

Also realize that up until 1924, Samaritans were forbidden to marry anyone outside the "community" of "Israel" ("the Israelites") and so surviving Samaritans who did not leave the community had a smaller and smaller choice of marriage partners. After the Great Samaritan Revolt of 512, only about 500 Samaritan families were documented, but in 1618 there were only 5 families who reassembled in Nablus. One of these male lines died out in 1912. In 1912, there were only 117 Samaritans left on earth, and the population only began to reexpand after intermarriage with Jews was permitted in 1924.

The Samaritans are probably the most highly inbred population isolate on Earth that does not come from a small geographically isolated island or mountain region.

They are also a remnant of a totally unadmixed pre-Hellenistic pre-Arab Conquest Levantine population, numerically almost all descendants of Judeans who formed the Samaritan religious community after 444 BCE. The reason we can be pretty certain that they are unadmixed is that those who did in fact marry out of the community (including with Jews) left, so the all Samaritans today descend from the ones who remained and observed the Biblical commandment about marrying outside the community.

We have an additional 3 unadmixed Samaritans in the Samaritan DNA Project, and one goal here is to get Family Finder data for them so we can add them to the Dodecad Project. Contributions are welcome!(I myself paid to test the Cohen-Levite for 67 STRs.)

Dienekes, could you explain why there are so many Palestinian clusters, and what this means for their relatedness to nearby populations?

Also, I'm a bit skeptical about the lack of overlap between Palestinians and Jordanians/Syrians/Lebanese. Over half of the population of Jordan is of Palestinian origin. If roughly the same percentage of people in the Jordanian sample are of Palestinian origin, it would mean that the Palestinian sample is unrepresentative (i.e. if the Palestinians in Jordan are mostly in cluster 5, but less than 10% of the Palestinian is). If I'm not wrong, the Palestinian sample you're drawing from was taken in Gaza. Is that right? This may be the reason for the lack of overlap between Palestinians and Jordanians/Syrians/Lebanese.

I don't see why the Behar et al. Jordanian sample would consist of inhabitants of Jordan of recent Palestinian origin. Also, the Palestinian HGDP sample is listed as Central Israel, so they are probably not from Gaza.

I'll also include the Palestinian_D sample next time I run this. In any case, I don't see any particular problem with the lack of overlap, which is what one expects for geographically separate groups with this type of fine-scale analysis.

Dienekes, can you do the following, although it may be a bit ambitious initially?

First, tell us what linkage map you are using to define the Morgans. These linkage maps are in fact quite different between different populations. Do they not have an updated set of linkage maps from Hapmap and from the newly phased 1K Genomes data?

Then, can you create a document of IBD sharing in Morgans for each Dodecad participant with *all* your academic samples?

These can be in a Google "Fusion" table, which apparently doesn't have the limitations of Google spreadsheets.

Here is the one I just created for the Afroasiatic sharing:

https://www.google.com/fusiontables/DataSource?snapid=S3711500Yft

10,000 x 10,000 may be a lot, but it may be possible to create a separate table for each participant, and then do a database "join" in a query.

I use the HapMap recombination maps. It may be possible to use population specific recombination maps, but that would require a few more steps. One would have to pre-cluster individuals into a small number of groups (e.g., using Clusters Galore on unlinked data), examine a posteriori the resulting clusters, and then cluster the subsets using fastIBD and a population specific linkage map for each of the subsets. Also, there are no population-specific recombination maps for the vast majority of populations, except at the very highest level (African-Asian-European).

Diekekes, yes, hence the problem. We can't be sure a Morgan is really a Morgan for any individual population. No one has done this yet. Maybe there's a way to figure out population-specific linkage maps with the phased 1K Genomes samples?

Do you think you can create a Google Fusion table for "everyone with everyone", all of DOD and all of the reference samples?

It sounds gigantic, but each field will be only a floating point number, 4 bytes, so the whole table will only be about 40 Kb, which isn't all that much.

The above Afroasiatic table and visualizations have proven to be incredibly useful. For example, I (DOD215) am the highest sharer with GSM537033 Samaritan #2 (after the other two Samaritans) at 28.75 cM, an Iraqi Jew GSM536713 at 27.79 cM, a Negev Bedouin HGDP00625 at 27.01 cM, followed by DOD712 at 26.45 cM. Why?

Useful software

You may cite, quote, or reproduce articles on this site for non-commercial purposes, provided that you attribute them to Dienekes Pontikos and provide a link either to the main page of this blog or to the individual blog entry you are referring to.