>
> How, for dummies, does degree of matching results convert to time of MRCA ?
>
> I presume statistical - year range with a certain degree of confidence. How
> is this year range affected by number of test sites. Could results from
>

Yes, it's VERY statistical <g>. I have written a small program which will
calculate the number of generations to the Most Recent Common Ancestor
(MRCA). It gives the median (that is, there is a 50-50 chance that two people
will find their common ancestor within that number of generations) and the
95% confidence interval (that is, 95% will find their common ancestor within
that range). The 95% CI covers a very wide range because rare events are
inherently unpredictable.

Warning -- the following paragraphs with technical background may make your
eyes glaze over. If so, just read it once over lightly but do persevere to
the conclusions.

The calculator is based on a method outlined by Bruce Walsh. (In his credits,
he mentions that he was prodded to write the paper by Bennett Greenspan of
Family Tree DNA.) The full text of his paper is available online at http://www.genetics.org/cgi/reprint/158/2/897.pdf.

Many of the equations and technical details are beyond me, but I think it's
worthwhile for everyone to look at the summary and the graphs and tables in
the paper. Table 1 lists the number of generations to the MRCA for two people
with varying number of mismatches out of 5/10/20/50/100 markers, assuming a
mutation rate of .002 per locus.

Walsh poses some objections to the MLE (Maximum Likelihood Estimate or Most
Likely Estimate) method used by Family Tree DNA and Oxford Ancestors. The MLE
gives you the mode, that is, the single most likely value, but it doesn't
convey the wide range of possible values. Also, when you apply MLE to find
the common ancestor (vs predicting the percentage of descendants with
mutations), the highest percentage of two samples which match will be found
in zero generations. That is to say, you match yourself!

Instead, Walsh uses a branch of statistics called Bayesian analysis, which
takes into account what you already know or can assume; in this case, prior
knowledge about populations. Walsh's chief assumption is that the population
base consisted of at least 250 people!

My husband (also known as "MathMan" in my household) wrote out the solutions
for definite integrals for 0, 1 and 2 mismatches (from equation 12 in the
paper). The solutions are complex polynomial equations which my program
solves by a trial and error method. Walsh mentions that he used a symbolic
algebra program called Mathematica.

Using my calculator, you can enter any mutation rate and number of markers,
so it is a supplement to Table 1.

One thing to note is that changing the mutation rate really affects the
outcome. The value of .002 is based on a paper by Heyer, written in 1997. The
sample used in that paper was 42 men descended from 12 founding fathers, with
a total of 213 generations, so you can see we genealogists could augment that
number considerably!

Since that time, other articles have found somewhat higher mutation rates,
closer to .003. In the general way of things, that wouldn't seem like a
signficant difference, but it greatly affects the final results of the
calculator.

Also, it should be noted that the mutation rate is averaged over several
markers, and the calculator doesn't take into account the fact that different
markers might have different mutation rates. Walsh's paper gives equations to
use if mutation rates are known for each marker, but we're not at that level
of refinement yet. He also has an extensive discussion on how to handle the
possibility of parallel mutations, back mutations, and multiple mutations at
one marker. For now, I think it's best to count a two-step change (e.g. from
14 to 16 repeats) as two separate mutations.

----

Conclusions:

1) More markers are better. And yes, it's possible to use more than one
company to expand the number of markers.

2) The range of possibilities for finding the MRCA is still very broad, even
with many markers. Don't focus too much on the single value of the MLE or
median.

3) Estimates of mutation rates are based on small sample sizes and subject to
change with more data.

4) Don't be unduly discouraged by #2. All of these calculations assume that
the two people in question are randomly selected. Surname projects (or two
people who have ancestors who lived in the same time frame and locality) are
"biased" samples which stack the odds in favor of finding the MRCA sooner.
But we don't have methods for quantifying that yet.