But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

Lumping the variances together and dividing the result by the sum of the mutation rates produces an estimate dominated by the fast moving markers, with the slow movers having little or no effect.

To the extent that this is true (and I'm not technically competent to argue it) there's also an effect of whether a marker is moving up or down; it can mutate in either direction. The "modal" numbers are based on whatever direction the markers took that were most successful in reproducing males -- and therefore have the present day appearance of having been the prehistoric norm. And I've seen no scientific reason really to believe that. The WAMH, etc. may or may not have been modal a few thousand years ago, whenever pappy L11 (as one example) had his sons. It's now modal for a majority of their West Atlantic survivors.

This post doesn't mean to deny that pappy L11 had a haplotype -- but questions how sure we can be what that was, until we have had the opportunity to dig up a few more really old guys, and test their Y-DNA to a phylogenetically meaningful level.

If you calculate the modal values for U106, U152 and L21 over 67 loci you will find the results are very similar, WAMH may not be the exact ancestral values but it can't be far off !!

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.

I wasn't comparing the behaviour of Voters v STRs. I was trying to demonstrate the effects of bad processing and interpretation of the results.

Analysis of the R-L21 Haplogroup using a 5633 (67 marker) sample from the latest of Mike's spreadsheets and Ken N 67 marker TMRCA calculator with original mutation rates produces the following basic result.

Using the same data but taking the mean of the individual marker variance divided by the individual mutation rate instead of dividing the sum of the variances by the mutation rates produces the following result.

The difference between the results using Marko H mutation rates has reduced but is still significant.

What is more significant is the contribution made to the results by fast and slow markers using the method of dividing the sum the variance by the sum of the mutation rates.

Using Marko H mutation rates the fastest 7 markers contribute 50% of the total while the slowest 10 contribute just 1 (one)% to the total.

The fastest marker predicts a MRCA of 705 generations or 21,150 YBP while the slowest marker predicts 27 generations or 810 YBP. Clearly the method being used is flawed and futhermore, the mutatation rates are not representative of the R-L21 Haplogroup sample

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.

I wasn't comparing the behaviour of Voters v STRs. I was trying to demonstrate the effects of bad processing and interpretation of the results.

Analysis of the R-L21 Haplogroup using a 5633 (67 marker) sample from the latest of Mike's spreadsheets and Ken N 67 marker TMRCA calculator with original mutation rates produces the following basic result.

Using the same data but taking the mean of the individual marker variance divided by the individual mutation rate instead of dividing the sum of the variances by the mutation rates produces the following result.

The difference between the results using Marko H mutation rates has reduced but is still significant.

What is more significant is the contribution made to the results by fast and slow markers using the method of dividing the sum the variance by the sum of the mutation rates.

Using Marko H mutation rates the fastest 7 markers contribute 50% of the total while the slowest 10 contribute just 1 (one)% to the total.

The fastest marker predicts a MRCA of 705 generations or 21,150 YBP while the slowest marker predicts 27 generations or 810 YBP. Clearly the method being used is flawed and futhermore, the mutatation rates are not representative of the R-L21 Haplogroup sample

But regardless, do you have simulations where you can show us the impact of this perceived flaw?

Yes Ken's latest TMRCA calculation includes this procedure.

Mark has completed calculations on various Haplogroups and his data includes the contribution that each marker makes to his results. Perhapse he can advise us what contribution the fast markers make to his results. Let's say the 10 fastest markers. He can do this by summing the 10 highest variance values and dividing the result by the sum of all of the variances.

However look at it this way. Would you expect to get an accurate prediction of the upcomming election by poling a sample of 100 voters from Salt Lake city and 1000 voters from detroit and summing the resulting voting intentions? Excuse the politics.

What I'm saying is I've read posts where Nordtvedt has done simulations and concluded that aggregating the markers versus individually running them caused insignificant differences. I'm pretty sure I've read Klyosov's response to a similar objection and it was a similar answer. This is an insignificant issue. I'm pretty sure Vineviz said the same.

How are you showing that your objection causes a significant impact? That's my question to you? Voters and STRs on the Y chomosome are not similar in behavior as far as I know.

What is the impact of your objection? Can you demonstrate? Other say they have demonstrated it is not material.

I wasn't comparing the behaviour of Voters v STRs. I was trying to demonstrate the effects of bad processing and interpretation of the results.

Analysis of the R-L21 Haplogroup using a 5633 (67 marker) sample from the latest of Mike's spreadsheets and Ken N 67 marker TMRCA calculator with original mutation rates produces the following basic result.

Using the same data but taking the mean of the individual marker variance divided by the individual mutation rate instead of dividing the sum of the variances by the mutation rates produces the following result.

The difference between the results using Marko H mutation rates has reduced but is still significant.

What is more significant is the contribution made to the results by fast and slow markers using the method of dividing the sum the variance by the sum of the mutation rates.

Using Marko H mutation rates the fastest 7 markers contribute 50% of the total while the slowest 10 contribute just 1 (one)% to the total.

The fastest marker predicts a MRCA of 705 generations or 21,150 YBP while the slowest marker predicts 27 generations or 810 YBP. Clearly the method being used is flawed and futhermore, the mutatation rates are not representative of the R-L21 Haplogroup sample

The 5400 ybp for L21 looks more realistic and that means they could have been involved with the building of Newgrange.

I take it you feel pretty good about the 3400 years before present? ... so that gets us to 1500 BC and maybe 2000 BC.

Unless someone comes up with another method, then yes it is very likely with a max of 4,069 years using the data's standard deviation. What I didnt show is, that using a confidence level of 95.45% has a +-987 YBP, adding a just 300 years to the best calculated probablity of +-668 years.

Just to test a question I had, I removed the two fastest markers in the 68-111 panel, STR's 712 and 710, produced these numbers and effectively, did not change the number of generations, only the variance change causing the STD Dev in Generations to increase. I was expecting this GenSD to decrease not the opposite to occur.

Mark,So that we are singing from the same sheet, can you run Ken N's Gen7.1 using 67 markers of Mike's latest 111 marker set (N=1134) and advise if you agree with the following results.

MRCA G=133 SigmaG = 10

MikeW's Age Estimator Gen7.1

MiKe's latest spreadsheet list is 1048 Hts

GA = 132.3SigmaGA = 10.244

Yep we are.

MJost

Mark,If we start by examining how the above result was derived, we have already agreed that the age in Generations is arrived at by dividing the sum of the Variances by the sum of the mutation rates, the sum of the mutation rate for this version of the model being fixed at m=0.12635.

In our example the sum of the Variances is ~ 16.7 (16.7/0.12635 = 132)If we now look at how the sum of the Variances is made up in our example, we can examine the values in row 536 (Vara) of the KNCalcs sheet.This shows that 60% of the Variance sum is contributed by just 10 of the fastest markers while just 1% is contributed by the slowest 10 markers.

Thus even though the analysis involves 55 separate markers, the result is dominated by a few fast moving markers, with slow movers contributing close to zero.

I suggest that the procedure used does not provide equal representation to each of the markers and is therefore not a reliable model.

In order to provide equal representation to each marker we can modify the procedure to carry out a statistical analysis across the markers as well as on the individual markers.

We do this as follows.

1) Insert an additional function in cell C550 (=C536*1000/C5)2) Copy the function to all markers on row 550 from D550 to BR5503) Insert in cell BS550 the function =AVERAGE(C550:BR550)Cell BS550 displays the MRCA in generations.

This procedure also allows us to examine how the individual markers in the Haplogrouphave behaved in relation to the mutation rates. If the mutation rates are representative of the Haplogroup the values in cells C550 to BR550 should be reasonably consistant.

In our example the values vary significantly between 38 generations and 701 generations (1,140YBP to 21,030YBP) suggesting that the mutation set used is not appropriate for R-L21.

If we start by examining how the above result was derived, we have already agreed that the age in Generations is arrived at by dividing the sum of the Variances by the sum of the mutation rates, the sum of the mutation rate for this version of the model being fixed at m=0.12635.

In our example the sum of the Variances is ~ 16.7 (16.7/0.12635 = 132)If we now look at how the sum of the Variances is made up in our example, we can examine the values in row 536 (Vara) of the KNCalcs sheet.This shows that 60% of the Variance sum is contributed by just 10 of the fastest markers while just 1% is contributed by the slowest 10 markers.

Thus even though the analysis involves 55 separate markers, the result is dominated by a few fast moving markers, with slow movers contributing close to zero.

Mark,So that we are singing from the same sheet, can you run Ken N's Gen7.1 using 67 markers of Mike's latest 111 marker set (N=1134) and advise if you agree with the following results.

MRCA G=133 SigmaG = 10

MikeW's Age Estimator Gen7.1

MiKe's latest spreadsheet list is 1048 Hts

GA = 132.3SigmaGA = 10.244

Yep we are.

MJost

Mark,If we start by examining how the above result was derived, we have already agreed that the age in Generations is arrived at by dividing the sum of the Variances by the sum of the mutation rates, the sum of the mutation rate for this version of the model being fixed at m=0.12635.

In our example the sum of the Variances is ~ 16.7 (16.7/0.12635 = 132)If we now look at how the sum of the Variances is made up in our example, we can examine the values in row 536 (Vara) of the KNCalcs sheet.This shows that 60% of the Variance sum is contributed by just 10 of the fastest markers while just 1% is contributed by the slowest 10 markers.

Thus even though the analysis involves 55 separate markers, the result is dominated by a few fast moving markers, with slow movers contributing close to zero.

I suggest that the procedure used does not provide equal representation to each of the markers and is therefore not a reliable model.

In order to provide equal representation to each marker we can modify the procedure to carry out a statistical analysis across the markers as well as on the individual markers.

We do this as follows.

1) Insert an additional function in cell C550 (=C536*1000/C5)2) Copy the function to all markers on row 550 from D550 to BR5503) Insert in cell BS550 the function =AVERAGE(C550:BR550)Cell BS550 displays the MRCA in generations.

This procedure also allows us to examine how the individual markers in the Haplogrouphave behaved in relation to the mutation rates. If the mutation rates are representative of the Haplogroup the values in cells C550 to BR550 should be reasonably consistant.

In our example the values vary significantly between 38 generations and 701 generations (1,140YBP to 21,030YBP) suggesting that the mutation set used is not appropriate for R-L21.

The results using Marko H mutation rates do not differ significantly.

I have been following this entry and would like to support your position. The present variance approach, in my judgment, underestimates TMRCA's. I've said this for quite a period of time, but it is difficult to prove because examples are rare where known dates are available.

Goldstein and stumpf in their review paper published in Science, March 2001, used a different approach. For each dys loci they computed the TMrCA by dividing the ASD by the mutation rate for that locus. This weights each loci equally.

The problem has been identified, but not accepted, by this community. The data shows that there is very little variance contributed by the slower mutators, they mutate around the modal. Faster mutators have a limited range of values they can assume, greater than +/- 1. Therefore they contribute some ASD/variance. However they saturate after a while also. In my opinion Variance/ASD does not model the mutational process.

What one needs to do is count mutations, but that is very difficult due to hidden mutations for fast and medium mutators. For longer durations, slow mutators can be used, but this requires care.

> How it is possible to count mutations between the two older haplotypes,> if in a period of about several thousand years in one rapidly mutating> locus> may be 5-10 mutations?> How it is possible to count mutations, if they are parallel in the same> locus in both haplotypes?> It is puzzling me.

MY RESPONSE:

1. It does not make sense to count mutations between two (!) haplotypes "ina period of about several thousand years". Who on Earth would want to do itand for what purpose? Mutations are governed by statistics, and twohaplotypes do not provide any good statistics with their mutations.

2. If you suspect many back and forth mutations in some loci, just excludethem. As simple as that. For example, for thousands years back I employ 22marker haplotypes, in which one mutations happens in several thousand years.This 22 marker panel is described in the Adv. Anthropology (2011) v. 1, No.2, 26-34.

3. With the slowest 16 marker haplotypes I can calculate timespans to acommon ancestor of man and chimpanzee.

4. "How it is possible to count mutations, if they are parallel in the samelocus in both haplotypes?" - please elaborate. Your question is hard tounderstand. However, please remember that you cannot work reliably with twohaplotypes. You cannot toss a coin two times only and hope to calculatesomething out of this "statistics".

I was writing about this from many years, but my principles were three (at least):

1) mutations happen around the modal2) there is a convergence to the modal as time passes3) sometime a mutation goes for the tangent

DYS391 mutates above all around 10 and 11 valuesDYS439 mutates above all around 11-12-13 etc Of course I am speaking of hg. R, but the same principle explains all the other haplogroups, which diverged only because they started from a different values gone for the tangent, but frequently from the same value and these values are almost the same also on very distant haplogroups.

My theory of the ancientness of hg. R in Europe presupposes this and I think it will come out winning.

> How it is possible to count mutations between the two older haplotypes,> if in a period of about several thousand years in one rapidly mutating> locus> may be 5-10 mutations?> How it is possible to count mutations, if they are parallel in the same> locus in both haplotypes?> It is puzzling me.

MY RESPONSE:

1. It does not make sense to count mutations between two (!) haplotypes "ina period of about several thousand years". Who on Earth would want to do itand for what purpose? Mutations are governed by statistics, and twohaplotypes do not provide any good statistics with their mutations.

2. If you suspect many back and forth mutations in some loci, just excludethem. As simple as that. For example, for thousands years back I employ 22marker haplotypes, in which one mutations happens in several thousand years.This 22 marker panel is described in the Adv. Anthropology (2011) v. 1, No.2, 26-34.

3. With the slowest 16 marker haplotypes I can calculate timespans to acommon ancestor of man and chimpanzee.

4. "How it is possible to count mutations, if they are parallel in the samelocus in both haplotypes?" - please elaborate. Your question is hard tounderstand. However, please remember that you cannot work reliably with twohaplotypes. You cannot toss a coin two times only and hope to calculatesomething out of this "statistics".

I agree with Anatole's general point here that doing things like TMRCA estimates between just two people is not reliable. You need more data to apply statistical averages.

I don't agree or disagree on his point about the 16 slowest markers, but apparently he thinks they have enough linear duration to be linear for a few million years.

His method is relatively simple which seem to go as follows.A) Choose a set of appropriate markers to suit the approximate age of the Haplogroup, fast markers for recent, slow markers for several thousands of years and very slow markers for longer periods.B) Discard any of the markers where the mutations on an individual allele are suspected of having gone none-linear due to back (reverse) mutations.C) Count the total number of mutations from the modal of the Haplogroup applicable to the chosen markers.D) Apply a constant to the total number of mutations for back (reverse) mutations which is derived from probability calculations to give a new (increased) total.E) Divide the New total of mutations by a single mutation rate derived from haplogroups of "known?" age/generations. The result is the TMRCA in generations.

That sounds very promising. What would the implication of this be for R1b?

Indeed, it looks like progress. I retained a basic formulae of 50 years per SNP. As R1b has the highest density of SNP on the Phylogenetic tree, it is a matter of counting SNP between clades and multiplying by 50 to estimate the age. A nice simple rule of thumb. I like the fact that Anatole is broadly agreeing with the methodology which gives me greater confidence re the checks and balances.As we are currently experiencing a rapid expansion in the Phylogenetic tree and number of new SNPs discovered, this will be of great benefit in calculating rough migration routes and timelines.We should get an updated tree (I hope) in the next few weeks with the release of Geno 2.0. That should be a good opportunity to test the theory.