Friday, August 19, 2011

A few comments on the use of DIYDodecad 2.0

Here are some observations that might be useful to people, especially for the new byseg and target modes:

1. Finding the origin of shared segments

Until now, when you had a segment match with another customer in your testing company, you had no idea what was the origin of the shared segment. Suppose, for example, that a Russian and a German share some sequence in a region X. This could be:

Russian-like ancestry in the German individual

German-like ancestry in the Russian individual

Third party ancestry in both individuals

Using the new modes, if the German saw an excess of Eastern European (relative to his usual average), then he'd pick the first scenario; if he saw nothing unremarkable, the second; if an excess of some component rare in both Russians and Germans (e.g., West_Asian), the third.

This is extremely important, as there is a noticeably confirmation bias in some individuals of interpreting the unusual as evidence of exotic ancestry. For example, an individual in search of Jewish ancestry may interpret segment matches with Jews as evidence for that ancestry: if he sees high Southwest_Asian ancestry in such segments, then that's a reasonable interpretation, but the shared segments could very well be interpreted as non-Jewish ancestry in the Jewish individual, if, they happen to be, e.g., East_European.

2. With parents' DNA

It is important to remember that each region includes both paternal and maternal DNA and you got a random draw of the segments inherited by their parents (your grandparents).

So, if you try to figure out where your region X came from, remember that it came from two places. So, if you see an unusual combination (e.g., Northeast_Asian + Northwest_African) that doesn't correspond well to any known population, this may mean that you got half of it from one parent, and the other half from the other.

Note also, that while on genomewide analysis a child's results will often be intermediate (but not necessarily so) in his ancestral components between his parents, this is not the case when looking at small segments. Suppose parent A is 50% West_Asian and 50% Mediterranean in a particular region, and parent B is 50% West_Asian and 50% West_European in the other region.

Then the child may end up with West_Asian near 100% in that region (if he happens to inherit the West_Asian segments from both parenets) or near 0% (if he happens to inherit the Mediterranean/West_European ones).

3. With Dodecad Oracle

In general, I discourage the use of Dodecad Oracle with chromosome or segment results. For two reasons:

Small segments may appear more mixed than they are, because there may not be any informative SNPs in a particular region to distinguish between some of the ancestral components. So, the scale of the noise may be higher. As an experiment, you can average your segments, weighted by either the number of SNPs or their physical length, and you will come up with something close to your "genomewide" average, that will, however, be off, because of this factor.

From a different perspective, segments may appear less mixed, because it is less likely that you got genetic material from all ancestral populations in a small section of your DNA. Your genomewide admixture may have several non-zero components, but you are unlikely to have many non-zero components in a small region (barring the aforementioned noise), and you could very well see >80% percentages in some of them that are very typical of a particular ancestral component.

How do I interpret this, is it that the segment that the individual I match with has an origin somewhere between West Asia and Southeast Asia, like Central Asia or North India?

No, because 45% West Asian 45% SEAsian does not refer to _a_ segment but to _two_ segments, and the numbers are suggestive that there is a West Asian and an East Eurasian segment in whatever matching region you have. Whether the match is on the West_Asian or SEAsian segment is a different issue.

Oh OK - so I should carry out more analysis of narrower range in the target area and try to determine the limits of the 2 shared chunks in the segment - in the hope that I might get a purer West Asian and a purer Southeast Asian segment?

Oh OK - so I should carry out more analysis of narrower range in the target area and try to determine the limits of the 2 shared chunks in the segment - in the hope that I might get a purer West Asian and a purer Southeast Asian segment?

You are thinking in terms of a SINGLE segment, but humans are diploid, and it's quite possible, and indeed likely, that in that particular region the West_Asian segment comes from one parent, and the Southeast_Asian from another, and, hence, the average reported by DIYDodecad.

A narrower range will determine whether there are contiguous segments that can be assigned to different origins, or a superposition of segments coming from different parents.

Useful software

You may cite, quote, or reproduce articles on this site for non-commercial purposes, provided that you attribute them to Dienekes Pontikos and provide a link either to the main page of this blog or to the individual blog entry you are referring to.