World Families Forums - The age of R-M343 calculated by Dienekes

ARCHIVAL COPY

WorldFamilies has changed our Forum Operating system and migrated the postings from the prior system. We hope that you’ll find this new system easier to use and we expect it to manage spammers much better. If you can’t find an old posting, please check our Legacy Forum to see if you can see the old posting there.

You've mentioned a lot of things that aren't true, and this is one. Modal values of populations change for a host of reason, including genetic drift.

Just because you don’t understand them doesn’t mean they aren’t true. Also like I said before genetic drift doesn’t happen on STRs, well it does happen, but it takes a timeframe way longer than 1/mu, certainly drift wouldn't manisfest itself in 7 STRs in a timeframe of 2000 years, for more info refer to the post below:

For the explanation of why under a directional neutral model the modal shouldn’t change with time even if bottlenecks occurred, and why under a directional biased model it does change with time regardless of the demography.

No, variance is not the same as counting mutations. It is counting squared differences. The distinction matters mathematically a great deal.

Yeah I know variance is the standard deviation squared, thanks though for refreshing the concept. Nonetheless, my point was that a lot of people call "the average number of mutations per marker" variance, of course that isn’t statistical variance, but a measurement of variance nonetheless.

You quoted one study which says it does NOT use the mode, then claim it uses the MEDIAN, all in support of a conclusion that MODE is the most commonly used method. You aren't making sense.

Myres.et.al.2010 picked the median haplotype per population, and then they counted mutations relative to that haplotype, so their avg var is not in fact variance (As in squared difference from the mean value) but simply the average number of mutations from the so called median, so yes, they did count mutations. In fact I replicated the same average variance when using a model that picks modal values for each STR independently, and then counting mutations from that modal value.

Besides, as I said before, it doesn't matter. If Busby is calculating TMRCA wrong, then so what? I'm not using his conclusions and neither are you. My point originally was that TMRCA estimates using strict self-variance calculations are not influenced by any of this. The variance provides a TMRCA estimate that is not affected in the least by any sort of directional bias, even if such a bias even existed.

Who is talking about Busby.et.al calculating TMRCA or not, I cited him to show that most scientists used the ASD methodology when estimating TMRCA. Well ok using self-variance calculations aren’t influenced by any of this, then what mutation rates do you use? The ones calculated from pedigrees assuming that the modal value is the ancestral value, or the ones measured from father-son pairs, which actually are basically an average of different mutation rates, as mutation rates for a given STR depends on the repeat value. Therefore the so called self-variance methodology, which is the interclade methodology fails to account for variation in the mutation rate as a function of repeat value. So yes, in a sense the self-variance or interclade isn’t subject to the modal variation with time, however, it fails due to other things, such as mutation rate being a function of repeat value.

EDIT: I changed the above text, the self-variance methodology does make the assumption that the mean value is the ancestral value. In fact Stumpf and Goldstein.et.al.2001 talk about the self-variance method (Average Squared Difference is used), and this is what they say:

Quote from: Stumpf.et.al.2001

For a single locus, the squared distance is given byΔi=(li -la)2

where la denotes the ancestral allele length and i refers to a present chromosome.

Also more from Stumpf.et.al.2001

Quote from: Stumpf.et.al.2001

In addition to the mean square errors of the estimates for TMRCA using the real ancestral alleles in Eq. 3, we also show the errors that result if the most common allele is assumed to be ancestral.

[...]

Uncertainty concerning the mutation rate and process presents a serious limitation for model-free approaches. In the case of microsatellites, deviations from the SMM (39, 40) could result in substantial biases that are hard to detect in model-free analyses. Important possible deviations from the SMM include variable step size (σ2 > 1) that has been observed at low frequency (46), directional bias in the mutation process (47), length dependence in the mutation rate (48) and step size, and a dependence on the size of the repeated motif.

Finally, it is clear that microsatellite allele length is constrained, in part as a result of the mutation process (49, 50), and this will influence the dynamic of distance measures.

So basically back in 2001 Stempf and Goldstein were already aware of the directional bias and the mutation rate dependence in repeat value.

This is what the said relative to variations in mutation rate relative to repeat value

Quote from: Stempf.et.al.2001

Length-dependent mutation rates.

A general mutation parameter may thus be written in the form

μ= μ(l)σ2(l) (5)

(39), where μ(l) and σ2(l) are the mutation rate and variance of step size, respectively. In the simplest instance of a length-dependent mutation process, the functional form will be linear; quite generally this form also describes more complicated functional dependencies to first order:

μ(l)= μ0 + μll (6)

To describe length dependence, we use the results of (48) for a set of 10 microsatellite loci in a large sample of Y chromosomes. This mutation rate model was implemented in the coalescent simulations as follows. In the generation of the simulated data sets we use μ= 0.0028 but assume that this corresponds to the average allele length (i.e., l= 16.96) in (46), and we adjust the mutation rate after each mutation event according to the new repeat length. When we estimate the genealogical depth of a haplogroup from Eq. 4, we calculate a length-adjusted mutation rate for each locus from Eq. 6 by using the average repeat size at that locus in the haplogroup. This procedure yields reliable estimates for TMRCA (Fig. 2B). If, however, the mutation rate is assumed to be constant, estimates for TMRCA may significantly deviate from their true values. As expected, accuracy again increases with the number of loci.

Just because you don’t understand them doesn’t mean they aren’t true. Also like I said before genetic drift doesn’t happen on STRs, well it does happen, but it takes a timeframe way longer than 1/mu, certainly drift wouldn't manisfest itself in 7 STRs in a timeframe of 2000 years, for more info refer to the post below:

I realize that a lot of people have invested a lot in Y-STRs, and theyserved their purpose for a while, but they're now a historical relic.Of course people might still find some use for the abundant datasetsthat have accumulated, but personally I see little reason to perfectthe oil lamp after we've discovered electricity.

To me, it seems more like seeing little reason to look at a watch since the Aztecs perfected the calendar. And that has more to do with why one wants to know the time, than what time it is.

I realize that a lot of people have invested a lot in Y-STRs, and theyserved their purpose for a while, but they're now a historical relic.Of course people might still find some use for the abundant datasetsthat have accumulated, but personally I see little reason to perfectthe oil lamp after we've discovered electricity.

To me, it seems more like seeing little reason to look at a watch since the Aztecs perfected the calendar. And that has more to do with why one wants to know the time, than what time it is.

Soon everyone will have a list of about 20,000 SNPs, eg, the road from Adam, including a 2-3 SNP own (private)

Of course we all have elaborated our theories by the means we had at our disposal and we will see who has won and who has lost. These SNPs, when at our disposal, will be able to serve our purpose like aDNA: they will say which is the line of descent of the various haplogroups.But science will lose its humanity. It will become a mere technique.

Having a rather unusual str sequence of my first 12 makers and getting an exact 12/12 match, as well as belonging to the ht-35 project I was very excited[393-12/.00076][389II-31/.00242]. However the person was of no recent relation{500-900}, even though he comes from the same region.

Perhaps a combination of extended str's and snp's. in the future will be the way to go

It is not the science, but the man behind the science, it cut's both ways. IMO science is the great equalizer, for humanity. I look forward to the day when someone who has been looked down upon by society in a caste system like an "untouchable" is proven to share the same genetics with everybody else; showing we are all related directly or indirectly!

So very very wrong. The quest for ancient ancestors is of course interesting but it is secondary to genealogical DNA testing. The original and still main use of DNA testing by genealogists is to obtain STR haplotypes for comparison with other researchers' haplotypes to find relationships and possibly get around those stone walls the paper trails run into. We need STRs for genealogy - I hope we don't lose sight of that and I hope the academics keep using STRs too.

I have a gd of 40 from an Irish U106 man and a gd of 17 from an Irish L21 man. Which man am I more closely related to? The U106 man. How are ystrs reliable? A difference at one marker and you are in a different subclade, for example dys492 .A null at 439 is another example.

So very very wrong. The quest for ancient ancestors is of course interesting but it is secondary to genealogical DNA testing. The original and still main use of DNA testing by genealogists is to obtain STR haplotypes for comparison with other researchers' haplotypes to find relationships and possibly get around those stone walls the paper trails run into. We need STRs for genealogy - I hope we don't lose sight of that and I hope the academics keep using STRs too.

I have a gd of 40 from an Irish U106 man and a gd of 17 from an Irish L21 man. Which man am I more closely related to? The U106 man. How are ystrs reliable?

STRs can vary up or down so you can get convergence. This makes them unreliable, in and of themselves, as single markers for a subclade. For instance, you can't really say YCAII=18,23 marks a single subclade. That's why many people look at STR signatures or patterns of multiple unusual STR values.

This does not mean STRs are unreliable for genealogy purposes or measuring diversity for populations.

The Rootsweb conversation between Anatole Klyosov, Dienekes Pontikos and Argiedude is becoming amusing. I don't like to copy posts from other forums too much but this is too good to resist.

I really don't care if what they call "R1b1a2a1a1a5" is 7K ybp or 3.5K ybp. I don't care if Anatole is right or Dienekes is right, but it will be a good thing if we can understand SNP counting methods a little better and it's definitely good fun.

LOL.

Quote from: Argiedude

I obtained identical estimates as you did in all the ages you posted, but I didn't take into account this issue (divide result by 2 to account for mutations accumulating on 2 chromosomes after splitting apart).For example, take I2a1. Your graph gives an age estimate of 23 kya, by comparing L158 with L178. I compared ...(L158) with ...(L178) and obtained the following:Total SNPs...18692Valid SNPs (didn't have a no call in either chromosome)...17146Variant SNPs...240...(the rate used by Dienekes for his estimate)...3x10^-8Generation used by Dienekes...25 yearsTotal bases in 1000Genomes data (approximately)...9,300,000

240 / {[9,300,000 x (17146 / 18692)] x 3x10^-8} x 25 = 23.4 kya

I didn't divide by 2 at any point and my result was the same.

Basically, Argiedude was asking Dienekes if he remembered to divide by 2.

Quote from: Dienekes

I remembered.

Quote from: Anatole K

MY RESPONSE:

VERY interesting. The "Argiedude" data shed light on a mystery with the SNP-based calculations conducted by Pontikos. If his response "I remembered" was truthful, then he must have obtained 46,000 ybp for I2a1, divided by 2, and finally obtained 23,000 ybp. However, for the same set of SNP Argidude obtained 23,400 ybp, which, after division by 2 should results in 11,700 ybp.

Since argiedude has clearly showed how he obtained the dta, and Pontikos did not, it is time to know how he did it.

This can explain the error which Pontikos obtained with the "age" of R1b1a2a1a1a5 of 7,000 ybp, since he apparently did not divide it by 2, to obtain 3,500 ybp, the likely "age" for the subclade, as I have noticed earlier. He did not want to admit it, and never answered the question on the origin of that 7,000 ybp, which Didier and myself repeatedly addressed to Pontikos.

If so, most or all the figures on the Pontikos list are in error, and represent double figures (100% error). If so, I retract my statement that SNP-calculated numbers by Pontikos give the same numbers as our STR-based calculations, because his numbers are erroneous, either most of them or all of them.

There is one way only for him to explain the situation: present the calculations for both the R1b1a2a1a1a5 subclade, and for some other haplogroups and subclades on his list. I would not have pressed for the matter, however, Pontikos poured a lot of venom on my STR-based calculations, at his blog and at this Forum. Now, it turned out that HIS data are highly questionable.

The Rootsweb conversation between Anatole Klyosov, Dienekes Pontikos and Argiedude is becoming amusing. I don't like to copy posts from other forums too much but this is too good to resist.

I really don't care if what they call "R1b1a2a1a1a5" is 7K ybp or 3.5K ybp. I don't care if Anatole is right or Dienekes is right, but it will be a good thing if we can understand SNP counting methods a little better and it's definitely good fun.

LOL.

Quote from: Argiedude

I obtained identical estimates as you did in all the ages you posted, but I didn't take into account this issue (divide result by 2 to account for mutations accumulating on 2 chromosomes after splitting apart).For example, take I2a1. Your graph gives an age estimate of 23 kya, by comparing L158 with L178. I compared ...(L158) with ...(L178) and obtained the following:Total SNPs...18692Valid SNPs (didn't have a no call in either chromosome)...17146Variant SNPs...240...(the rate used by Dienekes for his estimate)...3x10^-8Generation used by Dienekes...25 yearsTotal bases in 1000Genomes data (approximately)...9,300,000

240 / {[9,300,000 x (17146 / 18692)] x 3x10^-8} x 25 = 23.4 kya

I didn't divide by 2 at any point and my result was the same.

Basically, Argiedude was asking Dienekes if he remembered to divide by 2.

Quote from: Dienekes

I remembered.

Quote from: Anatole K

MY RESPONSE:

VERY interesting. The "Argiedude" data shed light on a mystery with the SNP-based calculations conducted by Pontikos. If his response "I remembered" was truthful, then he must have obtained 46,000 ybp for I2a1, divided by 2, and finally obtained 23,000 ybp. However, for the same set of SNP Argidude obtained 23,400 ybp, which, after division by 2 should results in 11,700 ybp.

Since argiedude has clearly showed how he obtained the dta, and Pontikos did not, it is time to know how he did it.

This can explain the error which Pontikos obtained with the "age" of R1b1a2a1a1a5 of 7,000 ybp, since he apparently did not divide it by 2, to obtain 3,500 ybp, the likely "age" for the subclade, as I have noticed earlier. He did not want to admit it, and never answered the question on the origin of that 7,000 ybp, which Didier and myself repeatedly addressed to Pontikos.

If so, most or all the figures on the Pontikos list are in error, and represent double figures (100% error). If so, I retract my statement that SNP-calculated numbers by Pontikos give the same numbers as our STR-based calculations, because his numbers are erroneous, either most of them or all of them.

There is one way only for him to explain the situation: present the calculations for both the R1b1a2a1a1a5 subclade, and for some other haplogroups and subclades on his list. I would not have pressed for the matter, however, Pontikos poured a lot of venom on my STR-based calculations, at his blog and at this Forum. Now, it turned out that HIS data are highly questionable.

Too bad Ken Nordvedt wasn't involved in this "soap opera" drama. Maybe he could be J.R. Ewing. I like nothing more than to read or hear about really brilliant men to argue their theories, point out flaws in each others maths and poke holes into the theories. I often wondered what a celebrity deathmatch would be like with Stephen Hawking, Leonard Susskind, Lisa Randall, Brian Greene, Michiu Kiko and Lawrence Krauss. I would put my bets on Leonard or Lisa.