Markwhat would you have to do with your formula to have a TMRCA of 6000 years for U106?

If this isn't what you meant, please let me know. There are several factors to consider.

TMRCA in Years is determined by the chosen Years per Generation times Generations. I have been using 30 as the standard, which I feel maybe an average of, lets say today back to 1AD of thirty years per gen and 20 year rate for BC time frame. So if the 20 year figure was higher then the TMRCA age increase, of course.

Generations as in number of, is based on the markers used and and their summation of individual marker variances and divided by the sum of each marker's mutation rate.

Variance is affected by the quanity of haplotypes all assumed to be in the same or under a common Subclade farther down the tree trunk.

Assuming haplotypes used are either tested to the above criteria or match others in the same clade such as varieties with one or more tested positive with a specified SNP.

With the extended panel (68-111) the confidence increases significantly and actually (it appears to be) a males age increases so does the number of mutations that can be transmitted. This increases the variance and the Mutation rate which causes the mrca in generations to decrease.

Markyou are brilliant at maths but if the TMRCA for U106 is 6000 then the formula is wrong. I think someone posted recently that M269 is 9000 ybp how would the TMRCA of U106 be between 3000 and 4000 ybp?

Markyou are brilliant at maths but if the TMRCA for U106 is 6000 then the formula is wrong. I think someone posted recently that M269 is 9000 ybp how would the TMRCA of U106 be between 3000 and 4000 ybp?

By someone else's calculation you say that U106 is 6K years before present.

I am using 111 marker haplotypes at a quanity of 320. Using the latest mutation rates by Marko via KenN' Generation111T engine with 17 multicopy markers removed using 94 net getting rid of any saturation effects.

U106's True Sample (n-1) variance has a TMRCA in Generations of 101.4. Using 30 years per generation equals 3,042.0 +- 631.4 years before present with a 68.27 percent (1-Sigma) spread. Even at 2 sigma, it is 4.3K old.

111 marker panels are showing nearly the same TMRCA as 67 Markers do using the same data set and is younger than using 37 marker set.

When I ran Busby's 15 (14) markers M269 with n=1035 using the same tool, it had an estimate of 143.0 +- 62.3 generations. Using 30 years per Generation, the True Founders TMRCA was 4,289.8 +-1,870.0 years before present. About 1,200 years between M269 and U106.

AND, L21 ran at Busby's 15 (14 used) markers and the 111 (94 used), I get the same number of generation at 113.2 and 113.6 respectively. P312 has 127.1 with Busby's markers used.

I would say L21 is an older clade than U106 by about a dozen generations.

I would say L21 is an older clade than U106 by about a dozen generations.

I'm not sure how precise we can get, but I've consistently gotten that P312 is older than U106 (STR diversity wise) and that U152 is as old as P312 with L21 quickly behind..... so this actually makes sense if you are getting them roughly the same age.

Anyway, it just feels good to find someone else seeing the same thing.

U106 is only 25% of R1b and has as many SNPS downstream and more wide spread.That for me makes it older. Secondly a 50 year old man has a 37,67, and 111 test. Nomatter which formula you use his age will still be 50.M269 was supposed to be born in the Neolithic 9000 ybp.

I would say L21 is an older clade than U106 by about a dozen generations.

I'm not sure how precise we can get, but I've consistently gotten that P312 is older than U106 (STR diversity wise) and that U152 is as old as P312 with L21 quickly behind..... so this actually makes sense if you are getting them roughly the same age.

Anyway, it just feels good to find someone else seeing the same thing.

Additionally, P312* could still contain many younger subclades that we don't yet know about therefore making it appear slightly younger than U152.

You bring up a good point. Groups of large younger subclades do affect the variance results. using 111 markers, L21 All n=1020 had a variance of 25.99 but when I removed the Large M222 subclade leaving n=873 it gave a smaller variance of 25.65. The generation difference was only one.

MikeW was correct in calculating the interclades between subclades and reporting that age data.

U106 is only 25% of R1b and has as many SNPS downstream and more wide spread.That for me makes it older. Secondly a 50 year old man has a 37,67, and 111 test. Nomatter which formula you use his age will still be 50.M269 was supposed to be born in the Neolithic 9000 ybp.

The problem with SNPs is that they have very low mutation rates. This makes them very useful for very deep SNP lineages (thousands of generations), but not very informative for very recent genealogies (tens to hundreds of generations). Hence, SNP markers are not very informative.

It is likely that mutation rates differ at microsatellites those with the highest mutation rates, these will provide the most information.

I dont think there is enough data points available to calculate ages from SNPs across short periods of time.

So, you trust your math based modeling more than the widely accepted notion that a parent must be older than all of his offspring? I really don't get this reasoning.

There's always a margin of error in these calculations, and in the identity of parents occasionally :)

Anatole K often confuses people on this. I think he has even said U152 is older than P312.Technically, we can't really say that, but this is even more than a margin of error thing.

This is why I use the words "U152 appears to be as old as P312." U152 had to have occurred after P312. It's younger from a first occurence viewpoint. It has to be.

However, U152 could still have higher STR diversity than P312 and this just be a margin of error problem.

Another situation could be that U152's TMRCA (most recent common ancestor of all surviving U152 people) is older than the TMRCA for P312*. This is quite possible and not illogical.

Another situation which also relates to error margins is that the STR diversity for all of P312 might be lower than all of U152 for reason of bias in the data. Let's say L21 is significantly younger than U152 and there are several more times L21 in the data than U152. The data is not representative and L21 could be dragging P312's diversity down a bit.

So, you trust your math based modeling more than the widely accepted notion that a parent must be older than all of his offspring? I really don't get this reasoning.

There's always a margin of error in these calculations, and in the identity of parents occasionally :)

Anatole K often confuses people on this. I think he has even said U152 is older than P312.Technically, we can't really say that, but this is even more than a margin of error thing.

This is why I use the words "U152 appears to be as old as P312." U152 had to have occurred after P312. It's younger from a first occurence viewpoint. It has to be.

However, U152 could still have higher STR diversity than P312 and this just be a margin of error problem.

Another situation could be that U152's TMRCA (most recent common ancestor of all surviving U152 people) is older than the TMRCA for P312*. This is quite possible and not illogical.

Another situation which also relates to error margins is that the STR diversity for all of P312 might be lower than all of U152 for reason of bias in the data. Let's say L21 is significantly younger than U152 and there are several more times L21 in the data than U152. The data is not representative and L21 could be dragging P312's diversity down a bit.

This shows us why interclade calculations are better than intraclade.

Yep I agree with all of that.

Something a lot of people seem to have trouble with is just how tenuous Y lines can be, it's not at all unreasonable for even quit old and large families to disappear without a trace or indeed only leave one surviving line.

The whole point of my posting ages of various clades using several different number of markers, Busby 15, 111 markers, etc. is to show the ages are reasonably set using subclade comparisons.

I didn't create the variance calculation engine nor the mutation rates used. I just expanded the output to report standard deviation information and removed several more multi-copy markers that should have not been used. I also used internal Excel functions to guarantee a better numeric performance and modified the interclade results using a Pooled SD formula.

Of course a son can not be older than the parent but overlapping confidence level (1-sigma at 68.27%) represents the unknown difference in ages as a fact but reasonably assumed. Maybe if one of the two clades, using a 95.5% confidence of +2 SD and the other at -2 SD and there still is a difference between them then we have a very solid assurance that the ages are correct via statistical significance.

It is tempting to look at whether two overlap or not, and try to reach a conclusion about whether the difference between means is statistically significant or not.

This statistical significance is a statistical assessment of whether observations reflect a pattern rather than just chance.

Useful rule of thumb: If two 95% CI error bars do not overlap, and the sample sizes are nearly equal, the difference is statistically significant with a P value much less than 0.05 (Payton 2003).

I was working to include T Test in the Interclade section as the t test takes into account sample size.

There was some question as to why Ken's Age Generator is producing younger ages. The mutation rates are from Marko Heinila that Ken utilized.

Marko could explain in much more detail as to his specific methodology. But I did have a question or two so I asked Marko to explain how and why of his 67 marker study appears different then his newest 111 marker. He explained in a recent email that:

"The 111T mutation rates are certain kind of averages for the about 4,000 sample dataset of 111 STR data. This is based on method (somewhat similar to Chandler's) that does not put critical importance on tree construction accuracy except to reduce statistical double counting in some degree in case of the "weighted-pair" estimate set. With the 111 rate estimates, it is assumed that per locus rates are approximately constant. The method is indifferent to multisteps.

The webpage that considers linearity of the 67 set, uses much more complex mutation models than the 111 estimates. The difference is that the mutation rate is assumed to depend on repeat number unlike in the case of the 111 estimates. Accurate tree constructions are needed in this case to find the related parameters. It is also necessary to model the multisteps and to be able to date the trees accurately enough. The 67 estimates there reflect the behavior in the estimated 67 STR trees.

The comparison of these two sets of results obtained by very different methods gives a rough idea of the uncertainties coming from the detailed mutation modeling and also says something about tree constructions. (The tree estimation problem is much more complex than the mutation rate estimation problem.)..."

Until someone else completes a much larger 111 marker mutation rate study with similar methods, I feel that Marko's estimation methods and results should be considered as a standard to be utilized.

[quote author=Mark Jost link=topic=10513.msg141028#msg141028 date=Generations as in number of, is based on the markers used and and their summation of individual marker variances and divided by the sum of each marker's mutation rate. MJost

This is a fundamental flaw in the process. Assuming a reasonable estimate of the mutation rates for the individual markers is known, why would you lump them all together to produce a single constant?

The model uses statistical analysis to determine the variance of each marker then proceeds to use simple arithmetic for the final step. Lumping the variances together and dividing the result by the sum of the mutation rates produces an estimate dominated by the fast moving markers, with the slow movers having little or no effect.

Lumping the variances together and dividing the result by the sum of the mutation rates produces an estimate dominated by the fast moving markers, with the slow movers having little or no effect.

To the extent that this is true (and I'm not technically competent to argue it) there's also an effect of whether a marker is moving up or down; it can mutate in either direction. The "modal" numbers are based on whatever direction the markers took that were most successful in reproducing males -- and therefore have the present day appearance of having been the prehistoric norm. And I've seen no scientific reason really to believe that. The WAMH, etc. may or may not have been modal a few thousand years ago, whenever pappy L11 (as one example) had his sons. It's now modal for a majority of their West Atlantic survivors.

This post doesn't mean to deny that pappy L11 had a haplotype -- but questions how sure we can be what that was, until we have had the opportunity to dig up a few more really old guys, and test their Y-DNA to a phylogenetically meaningful level.

The basic idea of how to combine input variances and arrive at the variance in the output, you add the variances. Then you take the square root of the resulting variance to compute the standard deviation. But this additive property relies on the linearity of the equation relating the distributions together. KenN removes Mutli-copy marker that do affect linearity and I and MikeW agree. Removing fast muatators at the 67 and 111 marker level could further smooth out lines but you would be removing data points that can provide additional history.

When considering linearity and the coefficients in the transfer function, involves trigonometric or higher order functions. Now we need a Six Sigma or Stats person to explain more and devise a more complete testing .

Kenneth Nordtvedt told me that 'Since we don’t know the internal structure of the general interclade tree, we can’t actually produce the SD; we can only give a most pessimistic case.'

[quote author=Mark Jost link=topic=10513.msg141028#msg141028 date=Generations as in number of, is based on the markers used and and their summation of individual marker variances and divided by the sum of each marker's mutation rate. MJost

This is a fundamental flaw in the process. Assuming a reasonable estimate of the mutation rates for the individual markers is known, why would you lump them all together to produce a single constant?

The model uses statistical analysis to determine the variance of each marker then proceeds to use simple arithmetic for the final step. Lumping the variances together and dividing the result by the sum of the mutation rates produces an estimate dominated by the fast moving markers, with the slow movers having little or no effect.

After you read my last post on summing variances then:

Yes you could calculate the age of each marker taking each variance and divided by the markers mutation rate. But the you have to add each marker's result and divide by the number of markers to get to an average age. But you are left with the old method to calculate SD using the sum of variances.

Using 67 marker haplotypes, the number of generation for L21 (n=1020) and P312xL21 (n=1638) is 143.5 and 174.6 respectively using your suggest method. The basic way of summing the variance produces 114.7 and 122.3 generations.

The Model does not calculate the estimated age of a Haplogroup i.e SNP or cluster. It calculates the estimated age of a group of Haplotypes - a sample of the Haplogroup in question. The result will only be representative of the Haplogroup if the sample is representative of the Haplogroup, and the assumed individual marker mutation rates are also representative. The question of whether or not the assumed mutatation rates are representative is easily cross-checked by adding a line into the Calculation as follows.

For each Marker calculate:- Variance of marker/Assumed Mutation rate for the marker.

The result is the estimated age in generations predicted each individual marker.

The accuracy of the assumed mutation rates can be determined by the consistency of the ages according to the individual predictions.

As I have previously posted, the process of dividing the sum of the Variances by the sum of the Mutation rates is fundamentally flawed.

The Model does not calculate the estimated age of a Haplogroup i.e SNP or cluster. It calculates the estimated age of a group of Haplotypes - a sample of the Haplogroup in question. The result will only be representative of the Haplogroup if the sample is representative of the Haplogroup, and the assumed individual marker mutation rates are also representative. The question of whether or not the assumed mutatation rates are representative is easily cross-checked by adding a line into the Calculation as follows.

For each Marker calculate:- Variance of marker/Assumed Mutation rate for the marker.

The result is the estimated age in generations predicted each individual marker.

The accuracy of the assumed mutation rates can be determined by the consistency of the ages according to the individual predictions.

As I have previously posted, the process of dividing the sum of the Variances by the sum of the Mutation rates is fundamentally flawed.

Ok, I am not a math guy. Here is Ken's paper on the subject to review.