Good, finally someone who understands the various gyrations that Busby went through and will defend them. If Busby's right, that's fine with me, but I don't "get it" yet.My understanding is that "bootstrapped" means "resampled" so the Figure 2 charts a, b and c on the right from the main body of the paper are based on Busby's resampling, not on "observed" variance.

It is bootstrapped because it includes the dataset from Myres et al(2010) plus new populations included in the new study. It is still random datasets and not chosen datasets to fit some purpose. So yes, this is observed variance in a set of populations from the Myres et al(2010) dataset, and the new paper. Refer to table-S1 in the supplementary excel file for a list of populations used and the sources of the study that first sampled the datasets. Again, we are not talking about sampling at convenience but about amplifying the Myres et al(2010) dataset with new populations.

The Supplementary Information is intended to support the primary body of the paper and show the details. The Supplementary Information Figure S2 panels A through E go through the gyrations of the actual observed variance calculations.

It is critical to Busby that he handles the Turkish and particularly the Irish samples differently to get what I guess are the appropriate outcomes.

Supplementrary S2 ( http://rspb.royalsocietypublishing.org/content/early/2011/08/18/rspb.2011.1044/suppl/DC1 ) Panel A is just a straight look at "observed variance" from what the data is. This is Barlaresque's view and is the straw-man proposal that Busby appears to be after. The Irish sample shows the lowest variance and the three Turkish samples show the highest (with most of the continent in between) - resulting in the clear east/west cline.

Panel B is where the Turkish samples are thrown out because "Removing the Turkish samples can be justified on the grounds that, in our dataset c.90% of Anatolian samples were un-derived at SNP S127." The east/cline diminishes signficantly but is still there slightly because of the low diversity in the Irish sample. I don't see how it is valid to throw out the Turkish sample because it contains a lot of L11(S127). Do multiple analyses at multiple levels of the phylogeny if that is what you want, but why throw out a location altogether?

Panel C throws out the Irish sample but leaves the Turkish in. The east/west cline is present even without the low diversity Irish sample.

Panel D introduces something new. This is where the "bootstrapping" comes in. The observed Irish variance is now replaced by the resampled variance. The Irish diversity is now higher than all of the European samples and is now as high as the highest Turkish sample. There is still a slight east/west cline, but apparently this is deemed as "insignificant". I'm okay with that determination, I just don't trust the resampling.

Quote from: Busby et al(2011)

Testing the variance calculations from the Irish population... We note, however, that 17-STR haplotypes, including the 9 STRs used in Balaresque et al’s analysis, are available for 681 Irish R-M269 derived individuals in Moore et al (3), which is, in fact, the study which Balaresque et al use to estimate R-M269 frequency in Ireland. A subset of the Moore et al samples were re-analysed in the current study for SNPs downstream of R-M269, and the original haplotype data are used here to calculate variance.[/b] To test if the Ysearch haplotypes were representative of the Irish R-M269 in Moore et al, we independently re-sampled the Moore et al dataset 10,000 times, selecting sub-samples of 75 haplotypes from which we estimated the variance using the same 9 STRs used in the Balaresque et al paper. The median variance of these 10,000 repetitions was 0.354 with a 95% CI of (0.285-0.432). The lowest variance value out of the 10,000 samples was 0.242, which is still higher than the figure observed in the Balaresque et al Ysearch sample (0.208). We therefore believe our estimate of Irish R-M269 variance to be a more robust representation of the true variance than that estimated by Balaresque et al. However, we note that the positive correlation between longitude and variance is still present after removing only the Irish and retaining the Balaresque et al Turkish populations. If we replace the variance calculated by Balaresque et al with that calculated from our repetitions, then the correlation is no longer significant, independent of whether or not we remove the Turkish samples (Figure S2).

Even though there are thousands of high resolution SNP Irish samples available, Busby resamples 75 at a time for only 9 STRs. He uses Moore's data. I don't understand why they had to use the Moore "A Y-Chromosome Signature of Hegemony in Gaelic Ireland," 2006. L11(S127) wasn't even tested for. Even if they don't like Ysearch, YHRD or whatever, there is plenty of deeply tested data from Ireland.

...anyway, after the "bootstrapping", Busby's "robust" representation of Irish R-M269 now zooms the Irish to the top of STR variance, jumping by Britain and Continental Europe and pulling even with the highest Turkish sample. Does this sound right? Should the hypothesis be that two different forms of R-M269 survived a bottleneck and exploded to take over Europe from both directions - one from Ireland and the other from Turkey?

I just don't like comparing data across regions with different treatments per region. If "robust bootstrapping" is good for Ireland then it should be done for continent and Turkey too.

As noted above, Busyby states this condition on his analysis.

Quote from: Busby

variance calculated by Balaresque et al with that calculated from our repetitions

However, this particular quote didn't complete the condition. The condition is he replaced the variance calculations for Ireland only, and with a different data set to boot. I'm not a brilliant statistician but I know that apples and oranges don't add up.

um.... I fear this is a problem with almost any of these Pan-European studies. They are piecing together data from multiple sources, but I hope that at least they will be consistent within their respective studies in their treatment of the data.

Let me give you a quote from the Balaresque et al(2010) study, it is located under Materials and Methods:

Quote from: Balaresque et al(2010)

A total of 2,574 DNA samples from European males, assigned to populations based on two generations of residence, were typed for the SNP M269 [17], defining hgR1b1b2. Following PCR amplification using the primers 59- TAAAGATCAGAGTATCTCCCTTTG-39 and 59-ATTTCTAGGAGTTCACTGTATTAC-39, the T to C transition was analysed by digestion with BstNI, which cleavesM269- C-allele chromosomes only. Samples fromthe Iberian peninsula were typed using the SNaPshot (ABI) procedure [31]. Haplotype data were obtained for up to 20 Y-specific microsatellites [32,33]. Data from the Ysearch database (http://www.ysearch.org) for Germany (GE) and Ireland (IR) were added, together with published data for Turkey, subdivided into East, West, and Central subpopulations based on published sampling information [14]. To avoid a bias from very large samples of hgR1b1b2 (GE and IR), these were randomly subsampled to give sample sizes of 75. This allowed a comparison of nine-locus haplotypes (DYS19, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, and DYS439) for 849 hgR1b1b2 chromosomes, subdivided into 23 populations. Greek and Serbian samples were too small for population-based analyses, but were included in Network analysis.

So the only reason why Busby sampled sets of 75 random haplotypes at a time was because Balaresque et al(2010) team did the exact same thing, and the only reason why they did not break down the resolution of R1b1b2 on the Balaresque et al(2010) study, was because the Balaresque et al(2010) study did not do it. So what the Busby et al. team basically did on the Supplementary file was to show that even using the erroneous methodology of the Balaresque et al. study, even without increasing the resolution of the sub-clades of R1b1b2 there isn’t any East-West gradient as the Balaresque et al. team said. Now the only reason why they didn’t do it for Turkey, or for any other European country, is because Balaresque et al. didn’t choose 75 random samples for Turkey, or any other European country.

This in fact goes to show what I said before, you guys so gladly took in the Balaresque et al. study and conclusions, that you didn't even question their sampling methodology. While you harshly criticized the Busby et al(2011) team for what you consider to be bias, you didn’t even realized that the Balaresque et al(2010) team did the exact same thing, with the only difference being that not once did the Busby et al(2011) team got such a low variance for the Irish sample when extracting 75 haplotypes at random 10000 times.This actually says a lot about the Balaresque et al(2010) study, it says that they only took 75 haplotypes at random once, and didn’t even bother to run a few hundred sampling scenarios to see if the observed variance fell within the average variance.

All I can say is that 10000 random samples are far better than a single random sample.

Yes. Slavery was inherent in Irish society as well. When the Irish were not fighting the Norse (and there was not a universal Irish coalition against the Norse, as there were alliances with them), they were busy fighting each other.

This also makes me think of a combined Irish/Norse exodus of some after the Battle of Clontarf. How can one explain the high numbers of R1b in Iceland? The Faroes may be similar too.

The high numbers of R1b in Iceland always perplexed me and of course when I mentioned it, some of the HG I guys stated that HG I is the most predominant (obviously). Even though there is a ton of data on Iceland for a "Celtic" genetic connection, however, the R1b numbers were discounted. Also, I always wondered about the sagas and how they originated. Perhaps it took a literate Celt to teach an intelligent barbarian how to write.

Good, finally someone who understands the various gyrations that Busby went through and will defend them. If Busby's right, that's fine with me, but I don't "get it" yet.My understanding is that "bootstrapped" means "resampled" so the Figure 2 charts a, b and c on the right from the main body of the paper are based on Busby's resampling, not on "observed" variance.

It is bootstrapped because it includes the dataset from Myres et al(2010) plus new populations included in the new study. It is still random datasets and not chosen datasets to fit some purpose. So yes, this is observed variance in a set of populations from the Myres et al(2010) dataset, and the new paper. Refer to table-S1 in the supplementary excel file for a list of populations used and the sources of the study that first sampled the datasets. Again, we are not talking about sampling at convenience but about amplifying the Myres et al(2010) dataset with new populations.

We don't know what the intent of the authors was in their "robust re-analysis" was. I'm not sure what subjective terms like "convenience" or "amplifying" mean related to a scientific study. They played with throwing the Irish out, throwing the Turkish out and then resampling the Irish.

I understand that bootstrapping relates to resampling, but does it seem right to you to only do resampling on the Irish and then compare them with other geographies that weren't resampled?

Does it seem right to you that the Irish "resampled" or bootstrapped variance was suddently higher than everything except the highest Turkish sample? I'm not asking you to make any assumptions about the Turkish sample but what about the continental samples in between? Are you and Busby saying that R-L11 originated in Ireland? That's okay, but I don't think they are saying that either. All I gather is they are saying Balaresque is wrong. That's why I call their approach a "strawman" attack. They don't really have a strong counter-hypothesis, just a strong attack against the opponents strawman that they choose.

Of course, they also messed around with STR selections to find ones that were "fit" although they weren't really through the Neolithic period, according to their own standards.

...This in fact goes to show what I said before, you guys so gladly took in the Balaresque et al. study and conclusions, that you didn't even question their sampling methodology. While you harshly criticized the Busby et al(2011) team for what you consider to be bias, you didn’t even realized that the Balaresque et al(2010) team did the exact same thing, with the only difference being that not once did the Busby et al(2011) team got such a low variance for the Irish sample when extracting 75 haplotypes at random 10000 times.This actually says a lot about the Balaresque et al(2010) study, it says that they only took 75 haplotypes at random once, and didn’t even bother to run a few hundred sampling scenarios to see if the observed variance fell within the average variance.

All I can say is that 10000 random samples are far better than a single random sample.

To be clear, as I've said before, I think the Balaresque study has errors too and is not comprehensive. I know I've said in the past their sampling is not a true cross-sectional representative sample. I know I've said their STR lengths were too limited. Perhaps you forgot or never saw those kinds of comments. I think that Balaresque's study includes serious errors. Is that harsh enough?

I am not sure how we measure harshness or gladness but perhaps my attitude towards Busby is harsh because their approach does not include a strong hypothesis but just a serious of attacks (not necessarily coordinated) to "move the ball back down the field" on the momentum started by Myres and Balarseque. I think they even said as much in a press interview.

I think the Barlaresque, Myres and Busby studies should be viewed in toto, versus just individual elements.

I am very glad of all three studies. The more spotlight on R1b, the better.

We don't know what the intent of the authors was in their "robust re-analysis" was. I'm not sure what subjective terms like "convenience" or "amplifying" mean related to a scientific study. They played with throwing the Irish out, throwing the Turkish out and then resampling the Irish.

I understand that bootstrapping relates to resampling, but does it seem right to you to only do resampling on the Irish and then compare them with other geographies that weren't resampled?Does it seem right to you that the Irish "resampled" or bootstrapped variance was suddently higher than everything except the highest Turkish sample? I'm not asking you to make any assumptions about the Turkish sample but what about the continental samples in between? Are you and Busby saying that R-L11 originated in Ireland? That's okay, but I don't think they are saying that either. All I gather is they are saying Balaresque is wrong. That's why I call their approach a "strawman" attack. They don't really have a strong counter-hypothesis, just a strong attack against the opponents strawman that they choose.

Of course, they also messed around with STR selections to find ones that were "fit" although they weren't really through the Neolithic period, according to their own standards.

Ok it seems you have some misunderstanding about the Busby et al(2011) study. First of all, there are three different analyses carried out throughout the study, you are mixing them all together either by accident or purposely to fit some point. Analysis-1 was done on the main study, and it was using the Myres et al(2010) dataset, and a newly collected dataset by Busby et al(2011) to analyze the bootstrapped variance of the new set which was made up of the old sets. Now if you want to question the legitimacy of the sampling techniques feel free to expose your reasons. I on the other hand having looked at table-S1 see no reason to question that the new populations would somehow drive the variance up or down, they are as random as the Myres et al(2010) dataset, and the combined set would still be as random. During this Analysis-1 there was no throwing out the Turkish, or re-sampling the Irish, and it was during this analysis that it was shown that the greatest variance of RxS127 occurs in Central Europe, and that there is no East-West variance gradient for R-S127+, refer to figure-2b) and figure-2c) in the main study, not the supplementary file.

The second analysis was to test the effect of microsatellite choice on age estimates, and this again has nothing to do with analysis-1 or analysis-3, and if one looks at figure-4, the results are obvious.

The third analysis which was done on the supplementary text-S1 has nothing to do with analyses1&2, and it consisted in a series of steps. First they removed the Turkish samples on the basis that 90% of Turkish samples from Myres et al(2010) were RxS127, which they extrapolated to the Balaresque et al(2010) dataset, and assumed the Turkish samples would be mostly RxS127 as well. When they removed the Turkish sample, the East-West gradient disappeared. Now the next thing they did was to using the same approach that the Balaresque et al(2010) team took, they sample randomdly 75 Irish and calculated the variance, but unlike Balaresque et al(2010) who did it once, they did it 10000, and not once did they find such low variance like Balaresque et al. did. So of course, given that they randomly sampled the set 10000, it is obvious that their variance, which was the mean variance of the randomly sampled sets was way more robust than that of Balaresque et al. Irish sample.

Does it seem right to you that the Irish "resampled" or bootstrapped variance was suddently higher than everything except the highest Turkish sample? I'm not asking you to make any assumptions about the Turkish sample but what about the continental samples in between? Are you and Busby saying that R-L11 originated in Ireland? That's okay, but I don't think they are saying that either. All I gather is they are saying Balaresque is wrong. That's why I call their approach a "strawman" attack. They don't really have a strong counter-hypothesis, just a strong attack against the opponents strawman that they choose.

Of course, they also messed around with STR selections to find ones that were "fit" although they weren't really through the Neolithic period, according to their own standards.

Of course it seems right that the Irish “resampled” variance is higher than it was on the Balaresque et al. sample, in fact, is not a question whether it should be higher or lower, it is a question whether it is more accurate or not? My answer: yes it is by far more accurate as they actually took the time to resample 10000 random sets. Again, there is no reason to re-sample the continental samples, because none of them ,except for the Germans were sample using the methodology of choosing 75 random haplotypes and then calculating the variance. I’m getting the sense that by presuming that Busby et al(2011) would be implying that R-L11 originated in Ireland, you sir are engaging in reductio ad absurdum. First and foremost, in his primary analyses nowhere does it show that the variance of R-S127 is higher in Ireland, so no reason to think that. Secondly if you presume that the analysis done on the supplementary text-S1 can be used as a clear cut indicative of origin based on variance, then it would mean that R1b1b2 in general not R-L11 has peaks of variance on both Turkey and Ireland. Since all you gather is that all they are saying is that Balaresque et al. must be wrong, and completely missed the first and second analysis done on the main study, where the Balaresque et al. team isn’t even touched, is not surprised that you have done the exact same thing that you claim the Busby et al. team did a “strawman” attack. You don’t really have a strong counter-argument to analysis-1 and analysis-2 so you decide to attack analysis-3 in the supplementary text S1, then based upon the misinterpretation of the analysis-3, you engaged in reduction ad absurdum, and try to generalize your conclusions about analysis-3 two the whole study. Because if you look at it, you will see that even if the analysis they did on the Balaresque et al. dataset on text S1 were wrong, that wouldn’t make their analysis on the Myres et al.+Busby et al. combined sets wrong, nor would it invalidate the findings of the appreciable effect of microsatellite choice on age estimates. So yes, there is a strawman attack here, but is not coming from the Busby et al. team towards the Balaresque et al. study, but from you to the new study.

Ok it seems you have some misunderstanding about the Busby et al(2011) study. First of all, there are three different analyses carried out throughout the study, you are mixing them all together either by accident or purposely to fit some point.

I admit I am confused by their analyses and I actually do appreciate you really reading through these studies. You are the first person to really discuss the details of their numbers so I appreciate that. Thank you.

As far any point I have on this, at least at this time, we just got here (conversation-wise) because on some other threads, STR diversity has been attacked so I just defend that which usually gets into counter-arguments based on Busby's work. Let me repeat, I think the Busby study was useful.

Let me go back and reread all three studies to look at their data sampling, etc.

Quote from: JeanL link=topic=10031.msg129923#msg129923

...So yes, there is a strawman attack here, but is not coming from the Busby et al. team towards the Balaresque et al. study, but from you to the new study.

I agree that I'm attacking some of the holes in their study. I'm also not writing a scientific paper.

I agree there really are holes in the Myres and Barlaresque studies and Busby is picking on them.

The reason why I think the Busby approach is a strawman approach is that they are doing various analyses that aren't consistent with each other to make their counter-points, but do not really to construct an alternative hypothesis (at least a comprehensive one.) It seemsBusby's focus is on Barlaresque's proposals and not much on Myres', actual age estimates, or the whole concept by Wells of a Central Asia origin for P, Q and R. To me that is not looking at the whole picture which is also why I consider their attack as a strawman approach.

I'm actually not as anti-Busby as you think. I don't know if you noticed, but I've never disagreed with Busyb that R1b-S127(L11) has insignificant differences across STR variance across Europe. I've agreed that is a fair point... just that my interpretation is that L11 spread across Europe quickly.

I've also agree that STR evaluation is useful. I just think that using limited numbers like 10 or 15 is not enough. That's what I see when I do my own comparisons on hundreds of long haplotypes anyway. I also think Busby's application of STRs does not match their own linear duration standards. That is an attack, but perhaps I just don't understand. Can you explain?

I've attacked the points I don't understand to see how they are defended, and to see if I can be convinced. That's really all that I'm doing. I'm fine with R-L11 having a Central European origin if that is the ultimate conclusion. R-P312's observed variance is highest in SE France and U106's is highest along Poland and Baltic States. Central Europe definitely is a possibility for L11. I'm not sure about R-L23* though.

Does it seem right to you that the Irish "resampled" or bootstrapped variance was suddently higher than everything except the highest Turkish sample? I'm not asking you to make any assumptions about the Turkish sample but what about the continental samples in between? Are you and Busby saying that R-L11 originated in Ireland? That's okay, but I don't think they are saying that either. ....

... I’m getting the sense that by presuming that Busby et al(2011) would be implying that R-L11 originated in Ireland, you sir are engaging in reductio ad absurdum.

I didn't presume that Busby is saying R-L11 originated in Ireland. I asked that question. I'm just pointing out that the treatment of different data by different methods by Busby (and by Myres or whoever) leads to unusual results.

BTW, a reductio ad absurdum argument is not necessarily a logical fallacy. It's only a logical fallacy "Where such an argument is premised on a false dichotomy, the ostensible proof is a logical fallacy." (Wikipedia) ... so such an argument is only a bad thing if STR diversity does not have a relationship with time (that would be the false dichotomy if that is the case but we can see that both Barlaresque and Busby do a great deal of analysis based on the hypothesis that STR diversity is related to time.)

I've also agree that STR evaluation is useful. I just think that using limited numbers like 10 or 15 is not enough. That's what I see when I do my own comparisons on hundreds of long haplotypes anyway. I also think Busby's application of STRs does not match their own linear duration standards. That is an attack, but perhaps I just don't understand. Can you explain?

You are right; they showed that there is a significant effect of microsatellite choice in age estimates that they should have used that finding when calculating TMRCA of R-S127 haplogroup which is on figure-4a. However, in figure-2 they did not calculate TMRCA in generations, but explored the bootstrapped variance, and in fact they do not seem to think that variance is affected by choice of STR, which is why they used 10 STRs on figure-2. In a nutshell they showed that microsatellite choice can have an effect on age estimates, but still used a combined set of 10 STRs to explore variance. Perhaps they think one should choose the STRs when calculating TMRCA based on similarity on mutations rates and the presumed time span for common ancestry, i.e. use the average mut/marker for the slowest or fastest STRs depending on the presumed TMRCA, but not the average mut/marker for the whole set, but if you want to calculate variance use the combined set of STRs.

You are right; they showed that there is a significant effect of microsatellite choice in age estimates that they should have used that finding when calculating TMRCA of R-S127 haplogroup which is on figure-4a. However, in figure-2 they did not calculate TMRCA in generations, but explored the bootstrapped variance, and in fact they do not seem to think that variance is affected by choice of STR, which is why they used 10 STRs on figure-2. In a nutshell they showed that microsatellite choice can have an effect on age estimates, but still used a combined set of 10 STRs to explore variance. Perhaps they think one should choose the STRs when calculating TMRCA based on similarity on mutations rates and the presumed time span for common ancestry, i.e. use the average mut/marker for the slowest or fastest STRs depending on the presumed TMRCA, but not the average mut/marker for the whole set, but if you want to calculate variance use the combined set of STRs.

Perhaps Busby wrote part 1, et wrote part 2, and Al wrote the summary. Or some other committee-based procedure. Each person who took part in the study may have had a clear methodology -- and followed it, as far as he or she went. It has multiple authors, not to mention multiple sources of the sampled DNA; if there are faults, they probably aren't all Busby's.

Anyway that's how it appeared to me last summer, when I read it... and agreed with Mike, at the time, that the paper as a whole seemed internally disjunct.

You are right; they showed that there is a significant effect of microsatellite choice in age estimates that they should have used that finding when calculating TMRCA of R-S127 haplogroup which is on figure-4a. However, in figure-2 they did not calculate TMRCA in generations, but explored the bootstrapped variance, and in fact they do not seem to think that variance is affected by choice of STR, which is why they used 10 STRs on figure-2. In a nutshell they showed that microsatellite choice can have an effect on age estimates, but still used a combined set of 10 STRs to explore variance. Perhaps they think one should choose the STRs when calculating TMRCA based on similarity on mutations rates and the presumed time span for common ancestry, i.e. use the average mut/marker for the slowest or fastest STRs depending on the presumed TMRCA, but not the average mut/marker for the whole set, but if you want to calculate variance use the combined set of STRs.

Perhaps Busby wrote part 1, et wrote part 2, and Al wrote the summary. Or some other committee-based procedure. Each person who took part in the study may have had a clear methodology -- and followed it, as far as he or she went. It has multiple authors, not to mention multiple sources of the sampled DNA; if there are faults, they probably aren't all Busby's.

Anyway that's how it appeared to me last summer, when I read it... and agreed with Mike, at the time, that the paper as a whole seemed internally disjunct.

Yes, but I'm not sure Myres and Balaresque are all that clearer. I'm trying to understand that better, but at least I'm able to repeat some of their calculations. In the Busby paper, I can't get the same results for some of these things. Unless I'm missing something, I think Busby (and Myres) should include their "resampling" runs as the supplementary information. Maybe I'm just missing it though. I sure as heck don't feel comfortable with data collected with different methods. I have some renewed admiration for the National Genographic Project.... but I guess the difference is they have money.

In the Busby paper, I can't get the same results for some of these things. Unless I'm missing something, I think Busby (and Myres) should include their "resampling" runs as the supplementary information. Maybe I'm just missing it though.

The populations that Busby et al. used from Myres have their 10 STRs listed on Table-S2, it has the population codes and haplogroups. The new populations that Busby sampled are on Table-S3, it has the repeat value for 15 STRs, the population codes, and the haplogroups. Though they did a very poor job when it comes to haplogroups, they only identify R-M269, S116 and S21, for example there are 2-French Basques who are R-M269(xS127) but we don't know if the other people that are listed there as R-M269 are all R-M269(xS127) or if there are some R-S127(xS116,S21). Feel free to analyze it.

I know I can duplicate some of the values provided by Myres et al. in the supplementary tables, I don't get the exact same values of variance, but something close to it. For example Myres et al.(2010) got that the variance for R-L23(xM412) on Turkey was 0.277, I repeated the same procedure using my modal calculator, and I got the variance for R-L23(xM412) for Turkey to be 0.2828. Their variance for the Caucasus R-L23(xM412) was 0.292, mine was 0.3063.

In the Busby paper, I can't get the same results for some of these things. Unless I'm missing something, I think Busby (and Myres) should include their "resampling" runs as the supplementary information. Maybe I'm just missing it though.

The populations that Busby et al. used from Myres have their 10 STRs listed on Table-S2, it has the population codes and haplogroups. The new populations that Busby sampled are on Table-S3, it has the repeat value for 15 STRs, the population codes, and the haplogroups. Though they did a very poor job when it comes to haplogroups, they only identify R-M269, S116 and S21, for example there are 2-French Basques who are R-M269(xS127) but we don't know if the other people that are listed there as R-M269 are all R-M269(xS127) or if there are some R-S127(xS116,S21). Feel free to analyze it.

I know I can duplicate some of the values provided by Myres et al. in the supplementary tables, I don't get the exact same values of variance, but something close to it. For example Myres et al.(2010) got that the variance for R-L23(xM412) on Turkey was 0.277, I repeated the same procedure using my modal calculator, and I got the variance for R-L23(xM412) for Turkey to be 0.2828. Their variance for the Caucasus R-L23(xM412) was 0.292, mine was 0.3063.

Thanks for looking at the data. I posted this elsewhere today but it belongs here.

.. I did go back to the Myres and Barlaresque papers and reread where they got their data from and how they put it together. I can repeat the variance calculations, etc., but I can't at all defend how they got their data from different sources. It makes me nervous.

I should add that I can't defend that it is representative. I don't think anyone say we have truly representative data.