Interesting that the sample from SE France was nearly 20% L21+ (S145+), while U152 (S28) was about 13%. P312xL21,U152 was 29% there, but that would be divided up among the subclades not tested for this paper and would probably not all be P312*.

Here are some interesting facts from the Busby paper relative to L21, facts that are really hardly if at all discernible from the L21 (S145) map in the supplementary info section.

Rennes in Bretagne had a frequency of 40% L21xM222 (not surprising).

Poitiers had 14%.

Marseille was 11% L21xM222.

Lille, up in the northeast near the Belgian border, had 10% L21xM222 (that one did surprise me - it is higher than I thought it would be).

Paris had 10% L21xM222.

Chalon-sur-Saône, in the east, near Dijon, had 8% L21xM222.

These frequencies are percentages of the total y-dna. That means L21 is actually huge all over France. I mean, 8% of the total y-dna in an area is pretty significant, let alone 10% and up.

If the shading scale on Busby's L21 map had been like those of its U106 and U152 maps, the true L21 picture in France would have been more readily apparent.

Yes it weird the way things seem to conspire against continental L21. I would imagine over 10% of the population translates into something like 15-20% of R1b in France.

I havent looked at the detail for Germany. Where is the peak in the Rhineland? Did they sample the Rhineland?

One out of 83 is the number of M222+ they found in that sample. There were five L21xM222 guys, apparently.

I would say that singleton M222 is an outlier. You can't say much more than that when you only find one.

But these arguments apply equally to the Norwegian sample.

One or two does not a trend make, but we definitely should not dismiss the idea that M222 was in Norway or the Midi-Pyrénées or Germany from prior to the M222 heyday times in NW Ireland and the lowlands of Scotland.

I know the STR's of the M222 in Norway but don't in Germany and southern France. It would interesting to see if they are different than standard M222.

It is an area that Z196 and SRY2627 might have passed through... both of those are found in Germany so I think an accompanying M222 or two is a reasonable possibility along the way towards the Pyrenees (If Jean M's proposals are correct.) The old languages of the Midi-Pyrénées are Gaulish, I believe. At least Occitan is which is where some think Catalan came from (Catalonia is where SRY2627 is high.)

If at least approaching 10% of the population throughout France is L21 it does raise the question as to how it was able to infiltrate everywhere to a significant level. It actually suggests a rather more widespread significant input in France than even the project data suggested. The widespread nature does make origin and spread direction harder to infer.

As for the very high L21 in Rennes. Rennes is in the part of Brittany which spoke a latinate dialect and so automatically to link this with P-Celtic speakers from Britain may be making an assumption. As I have said before, the best evidence that NW France was already very high in L21 before the Bretons is the very high L21 in Wales, Ireland, western Scotland etc. That high L21 would not have happened repeatedly in Atlantic British Isles if the origin populations were not also high L21.

One out of 83 is the number of M222+ they found in that sample. There were five L21xM222 guys, apparently.

I would say that singleton M222 is an outlier. You can't say much more than that when you only find one.

But these arguments apply equally to the Norwegian sample.

bestauthun

Only in isolation. There are at least two M222 in the Busby Norwegian data (a tad weak, I'll admit, but as many as in the much larger French sample), and I know of another (Skaar, kit N55657) from the R-M222 Project. There are at least four L159.2 Norwegians that we know of from the R-L159.2 Project. There is another Norwegian in the R-L21 Plus Project who matches the Scots Modal.

That seems like a fair amount of British Isles-localized stuff for such a small, under-sampled country.

I know not everyone appreciates Anatole Klyosov but I do. I'm not saying I agree with him and his approach all of the time, even most of the time, but he is smart I find arguments that he is involved as entertaining.

O.K., since the readers are silent, let me explain why the Fig. 4 from the paper ("Here is a figure from the paper, showing age estimates of sub-haplogroups R-S21 vs. R-S116") is not adequate, mildly speaking.

First, they took totally wrong mutation rates - from Ballantyne et al (2010) [from father-son transmission]. The data are based on just a few mutations, and awfully unreliable. Let me show it.

Everyone who worked with haplotypes and their mutations, knows that DYS393 is a very slow marker, and DYS390 is a fairly fast one. Indeed, Chandler's table, the most reliable one for the first 12 markers, shows the respective mutation rate constants as 0.00076 and 0.00311 (mutations per marker per generation), a 4-time difference.

Whar do we see in the Ballantyne's data? 0.00211 and 0.00152 (!) DYS393 is FASTER than DYS390 (!!). An utter nonsense. How did it happen? Very simple: Among 1758 father-son pairs Ballantyne et al observed just 3 mutations in DYS393, and 2 mutations in DYS390, and they took it (!) as a solid base for their absurd mutation rate constants.

This is applicable to all their "mutation rates". The reason is that among those almost 2000 father-son pairs, there were 3, 2, 7, 5, 3, 6, 0, 0, 6, 9, 1, 6 mutations in the first 12 matkers. It just cannot be used for mutation rate estimates.

Enough? Not quite. Just a minor thing in this context. When one works with "fast" markers, a correction for back mutations is MUCH higher. Otherwise one obtains a systematic deviation in TRCAs from slow to fast markers.

So, the conclusion number one: forget about Fig. 4 and the "principal conclusions" of the Busby et al paper. They are all wrong.

Now, I can present here data on "age estimates of sub-haplogroups R-S21 vs. R-S116", based on a much better approach. This is an important question, because it likely sets a good DNA-related time estimate for Bell Beaker movements from Iberia up North to the continental Europe.

I think, Dienekes, that since you presented here negative "side of the coin" you with your fairness would like to see its positive side. Aren't you?

Thank you.

Anatole Klyosov

Monday, August 29, 2011 3:21:00 PM

I do agree with Anatole on how Busby et al picked certain STR's that didn't match very well with what they said their intent was, and was the main thrust of their argument. I still agree with Busby that STR selection is important, not just the limited individual STR's and number of STR's they chose.

I think some have gone a bit overboard in declaring the death of STR variance and TRMCA estimations. As Mark Twain would say,

I know not everyone appreciates Anatole Klyosov but I do. I'm not saying I agree with him and his approach all of the time, even most of the time, but he is smart I find arguments that he is involved as entertaining.

O.K., since the readers are silent, let me explain why the Fig. 4 from the paper ("Here is a figure from the paper, showing age estimates of sub-haplogroups R-S21 vs. R-S116") is not adequate, mildly speaking.

First, they took totally wrong mutation rates - from Ballantyne et al (2010) [from father-son transmission]. The data are based on just a few mutations, and awfully unreliable. Let me show it.

Everyone who worked with haplotypes and their mutations, knows that DYS393 is a very slow marker, and DYS390 is a fairly fast one. Indeed, Chandler's table, the most reliable one for the first 12 markers, shows the respective mutation rate constants as 0.00076 and 0.00311 (mutations per marker per generation), a 4-time difference.

Whar do we see in the Ballantyne's data? 0.00211 and 0.00152 (!) DYS393 is FASTER than DYS390 (!!). An utter nonsense. How did it happen? Very simple: Among 1758 father-son pairs Ballantyne et al observed just 3 mutations in DYS393, and 2 mutations in DYS390, and they took it (!) as a solid base for their absurd mutation rate constants.

This is applicable to all their "mutation rates". The reason is that among those almost 2000 father-son pairs, there were 3, 2, 7, 5, 3, 6, 0, 0, 6, 9, 1, 6 mutations in the first 12 matkers. It just cannot be used for mutation rate estimates.

Enough? Not quite. Just a minor thing in this context. When one works with "fast" markers, a correction for back mutations is MUCH higher. Otherwise one obtains a systematic deviation in TRCAs from slow to fast markers.

So, the conclusion number one: forget about Fig. 4 and the "principal conclusions" of the Busby et al paper. They are all wrong.

Now, I can present here data on "age estimates of sub-haplogroups R-S21 vs. R-S116", based on a much better approach. This is an important question, because it likely sets a good DNA-related time estimate for Bell Beaker movements from Iberia up North to the continental Europe.

I think, Dienekes, that since you presented here negative "side of the coin" you with your fairness would like to see its positive side. Aren't you?

Thank you.

Anatole Klyosov

Monday, August 29, 2011 3:21:00 PM

I do agree with Anatole on how Busby et al picked certain STR's that didn't match very well with what they said their intent was, and was the main thrust of their argument. I still agree with Busby that STR selection is important, not just the limited individual STR's and number of STR's they chose.

I think some have gone a bit overboard in declaring the death of STR variance and TRMCA estimations. As Mark Twain would say,

Quote

reports of my death have been greatly exaggerated.

The exchange of posts between Anatole and Dienekes before that was...ahem...

I know not everyone appreciates Anatole Klyosov but I do. I'm not saying I agree with him and his approach all of the time, even most of the time, but he is smart I find arguments that he is involved as entertaining.

O.K., since the readers are silent, let me explain why the Fig. 4 from the paper ("Here is a figure from the paper, showing age estimates of sub-haplogroups R-S21 vs. R-S116") is not adequate, mildly speaking.

First, they took totally wrong mutation rates - from Ballantyne et al (2010) [from father-son transmission]. The data are based on just a few mutations, and awfully unreliable. Let me show it.

Everyone who worked with haplotypes and their mutations, knows that DYS393 is a very slow marker, and DYS390 is a fairly fast one. Indeed, Chandler's table, the most reliable one for the first 12 markers, shows the respective mutation rate constants as 0.00076 and 0.00311 (mutations per marker per generation), a 4-time difference.

Whar do we see in the Ballantyne's data? 0.00211 and 0.00152 (!) DYS393 is FASTER than DYS390 (!!). An utter nonsense. How did it happen? Very simple: Among 1758 father-son pairs Ballantyne et al observed just 3 mutations in DYS393, and 2 mutations in DYS390, and they took it (!) as a solid base for their absurd mutation rate constants.

This is applicable to all their "mutation rates". The reason is that among those almost 2000 father-son pairs, there were 3, 2, 7, 5, 3, 6, 0, 0, 6, 9, 1, 6 mutations in the first 12 matkers. It just cannot be used for mutation rate estimates.

Enough? Not quite. Just a minor thing in this context. When one works with "fast" markers, a correction for back mutations is MUCH higher. Otherwise one obtains a systematic deviation in TRCAs from slow to fast markers.

So, the conclusion number one: forget about Fig. 4 and the "principal conclusions" of the Busby et al paper. They are all wrong.

Now, I can present here data on "age estimates of sub-haplogroups R-S21 vs. R-S116", based on a much better approach. This is an important question, because it likely sets a good DNA-related time estimate for Bell Beaker movements from Iberia up North to the continental Europe.

I think, Dienekes, that since you presented here negative "side of the coin" you with your fairness would like to see its positive side. Aren't you?

Thank you.

Anatole Klyosov

Monday, August 29, 2011 3:21:00 PM

I do agree with Anatole on how Busby et al picked certain STR's that didn't match very well with what they said their intent was, and was the main thrust of their argument. I still agree with Busby that STR selection is important, not just the limited individual STR's and number of STR's they chose.

I think some have gone a bit overboard in declaring the death of STR variance and TRMCA estimations. As Mark Twain would say,

Quote

reports of my death have been greatly exaggerated.

Its like the old Oscar Wilde quip that 'the rumours of my death have been greatly exaggerated'

I was always told the northern Irish and the Norwegian Vikings were BFF's and they both hated the Danes and Dalcassians and refused to fight at the Battle of Clontarf.I mentioned some where else that (a BBC) program had some guy saying that iceland's mt-dna was something lie 70% Irish and not slaves? Of corse 100% y-dna was Nordic.If anyone knows what that is please tell.

I was always told the northern Irish and the Norwegian Vikings were BFF's and they both hated the Danes and Dalcassians and refused to fight at the Battle of Clontarf.I mentioned some where else that (a BBC) program had some guy saying that iceland's mt-dna was something lie 70% Irish and not slaves? Of corse 100% y-dna was Nordic.If anyone knows what that is please tell.

"The Icelanders are one of the most studied populations in human genetics [1]–[5]. According to historical and archaeological sources, Iceland was settled roughly 1100 years ago by a mixture of people that originated primarily from Scandinavia and the British Isles [6],[7]. Studies of mtDNA variation indicate that contemporary Icelanders trace about 37% of their matrilineal ancestry to Scandinavia, with the remainder coming from the populations of Scotland and Ireland [1],[8],[9]. In contrast, Y-chromosome analyses suggest that 75–80% of their patrilineal ancestry originated in Scandinavia"

Would it be likley that the vikinng slave traders were not all Norse.Dublin and York seem to be the 2 main centers which seem to be fairly cosmopolitan. e.g. the battle of clontarf the opposition to Brian Boru was a Lienster/Viking mix.

Yes. Slavery was inherent in Irish society as well. When the Irish were not fighting the Norse (and there was not a universal Irish coalition against the Norse, as there were alliances with them), they were busy fighting each other.

This also makes me think of a combined Irish/Norse exodus of some after the Battle of Clontarf. How can one explain the high numbers of R1b in Iceland? The Faroes may be similar too.

Alan has brought this proposal up several times and I see that Busby referred to a similar alternative - A mid to late Neolithic expansion for R-L11 groups from continental Western and Central Europe.

When I originally read "The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269" by Busby et al. I passed over this because I couldn't find the source document referenced and was more focused on STR diversity/clines rather than frequency charts.

Quote from: Busby

A recent analysis of radiocarbon dates of Neolithic sites across Europe [46] reveals that the spread of the Neolithic was by no means constant, and that several ‘centres of renewed expansion’ are visible across Europe, representing areas of colonization, three of which map intriguingly closely to the centres of the sub-haplogroups foci (electronic supplementary material, figure S3).

Figure 3 is the one that shows R-S145(L21), R-S28(U152) and R-S21(U106) frequency maps and what Busby calls localized centers or sub-haplogroup foci.

Maybe they were on to something here? I still have concerns about how L23* or L11* would have had to slip into Europe to initiate or ride population explosions from multiple localized centers pretty much simultaneously.

When you see the different outlays of U106 (really should think of it as L48, U198, Z18, L1) and P312 (U152, L2, L21, Z196, SRY2627 - they are all about the same age) are these the remnants of a series of separate explosions or just a single explosion where the actual center has been obliterated or obscured?

MikeI think your last sentence is quite possible. There were distinct sudden expansions in the middle Neolithic after a long hiatus period after LBK. I find in particular the similarity between U106 and TRB/Funnel Beaker very interesting. U152 could represent expansion in the Alpine area and L21 the (again similar period) expansion into the wet Atlantic areas. The only smoking gun that ties into that period of middle Neolithic expansion was the spread of developed dairy pastoralism. However, as I have said before, if this happened, its spread through central Europe must have been low visibility and is not represented by a handy culture with typical pot trail. All that can be said is the old LBK zone was transformed in that period, took up dairying and expanded into new zone that were previously shunned.

I was always told the northern Irish and the Norwegian Vikings were BFF's and they both hated the Danes and Dalcassians and refused to fight at the Battle of Clontarf.I mentioned some where else that (a BBC) program had some guy saying that iceland's mt-dna was something lie 70% Irish and not slaves? Of corse 100% y-dna was Nordic.If anyone knows what that is please tell.

"The Icelanders are one of the most studied populations in human genetics [1]–[5]. According to historical and archaeological sources, Iceland was settled roughly 1100 years ago by a mixture of people that originated primarily from Scandinavia and the British Isles [6],[7]. Studies of mtDNA variation indicate that contemporary Icelanders trace about 37% of their matrilineal ancestry to Scandinavia, with the remainder coming from the populations of Scotland and Ireland [1],[8],[9]. In contrast, Y-chromosome analyses suggest that 75–80% of their patrilineal ancestry originated in Scandinavia"

"ANNAGASSAN IN Co Louth is home to one of the world’s most important Viking sites, a local curator has claimed.

The Vikings over-wintered in two places in Ireland: one would become Dublin, the other was believed to have been lost in time. Not any more.

A year after test trenches were dug on the “virgin” site, the results of radio-carbon testing on some of the artefacts recovered have confirmed that “Linn Duachaill” exists and is perfectly preserved underneath farmland."

If this site turns out to be as expected, it could further strengthen the historical links between the Vikings and Irish and move there sphere of influence further north to the borders of Ulster.

I agree. As the maps on your linked document show the Vikings were raiding along the east and south Coast and the large rivers Shannon and Erne. I remember reading somewhere that some made biannual trips to the Isles from Norway, after the Spring sowing and following the late Summer harvest.I believe the overwintering sites such as Dublin, Wexford and Waterford led to the establishment of these towns. The Louth site never developed in this way and hopefully is in a relatively undisturbed condition.

edit:"Raiding was often a part-time occupation. Chapter 105 of Orkneyinga saga describes the habits of Sveinn Ásleifarson. In the spring, he oversaw the planting of grain on his farm at Gáreksey. When the job was done, he went off raiding in the Hebrides and Ireland, but he was back to the farm in time to take in the hay and the grain in mid-summer. Then he went off raiding again until the arrival of winter."

Alan has brought this proposal up several times and I see that Busby referred to a similar alternative - A mid to late Neolithic expansion for R-L11 groups from continental Western and Central Europe.

When I originally read "The peopling of Europe and the cautionary tale of Y chromosome lineage R-M269" by Busby et al. I passed over this because I couldn't find the source document referenced and was more focused on STR diversity/clines rather than frequency charts.

Quote from: Busby

A recent analysis of radiocarbon dates of Neolithic sites across Europe [46] reveals that the spread of the Neolithic was by no means constant, and that several ‘centres of renewed expansion’ are visible across Europe, representing areas of colonization, three of which map intriguingly closely to the centres of the sub-haplogroups foci (electronic supplementary material, figure S3).

Figure 3 is the one that shows R-S145(L21), R-S28(U152) and R-S21(U106) frequency maps and what Busby calls localized centers or sub-haplogroup foci.

Maybe they were on to something here? I still have concerns about how L23* or L11* would have had to slip into Europe to initiate or ride population explosions from multiple localized centers pretty much simultaneously.

When you see the different outlays of U106 (really should think of it as L48, U198, Z18, L1) and P312 (U152, L2, L21, Z196, SRY2627 - they are all about the same age) are these the remnants of a series of separate explosion or just a single explosions where the actual center has been obliterated or obscured?

The demographic process detected in the works of Bocquet-Appel is that after an initial demographic expansion in the early Neolithic in Europe there is a deep recession. As this process is found in both LBK and Cardial Neolithic and in sites spread all over Europe, the cause should be in the neolithic culture itself. Bocquet-Apel thinks the cause is the spread of disiases associated to domestic animals.

Shennan & Edinborough mention the zoonoses of domesticated animals as being a possible cause, in the 2009 paper however, Shennan is still unclear as to the cause but discusses warefare. Changes in climate, rainfall, temperature etc affect harvests and Europe and there was much to be learned in terms of keeping sufficient seed stocks and in keeping them in good condition. I should imagine more than one group found the following year's seed stock had gone mouldy when it came to the growing season. Prevention of infectious zoonoses from domesticated animals is of course a major problem even today. In the period that we are talking about, they had it all still to learn.

Dr. Anatole A. Klyosov has posted several new english entries in the Russian DNA Proceedings With three articles. One is the reposting of an article that was earlier published in Proceedings of the Russian Academy of DNA Genealogy, vol. 3, No. 1, 2010 (in Russian), and it was later translated to english. The second was a Anatole A. Klyosov's discussion of the overview of Busby et al (2011) article in Proc. of the Royal Soc. (B) and Dienekes Pontikos “essay. And the third article was his response to letters.

I know not everyone appreciates Anatole Klyosov but I do. I'm not saying I agree with him and his approach all of the time, even most of the time, but he is smart I find arguments that he is involved as entertaining.

O.K., since the readers are silent, let me explain why the Fig. 4 from the paper ("Here is a figure from the paper, showing age estimates of sub-haplogroups R-S21 vs. R-S116") is not adequate, mildly speaking.

First, they took totally wrong mutation rates - from Ballantyne et al (2010) [from father-son transmission]. The data are based on just a few mutations, and awfully unreliable. Let me show it.

Everyone who worked with haplotypes and their mutations, knows that DYS393 is a very slow marker, and DYS390 is a fairly fast one. Indeed, Chandler's table, the most reliable one for the first 12 markers, shows the respective mutation rate constants as 0.00076 and 0.00311 (mutations per marker per generation), a 4-time difference.

Whar do we see in the Ballantyne's data? 0.00211 and 0.00152 (!) DYS393 is FASTER than DYS390 (!!). An utter nonsense. How did it happen? Very simple: Among 1758 father-son pairs Ballantyne et al observed just 3 mutations in DYS393, and 2 mutations in DYS390, and they took it (!) as a solid base for their absurd mutation rate constants.

This is applicable to all their "mutation rates". The reason is that among those almost 2000 father-son pairs, there were 3, 2, 7, 5, 3, 6, 0, 0, 6, 9, 1, 6 mutations in the first 12 matkers. It just cannot be used for mutation rate estimates.

Enough? Not quite. Just a minor thing in this context. When one works with "fast" markers, a correction for back mutations is MUCH higher. Otherwise one obtains a systematic deviation in TRCAs from slow to fast markers.

So, the conclusion number one: forget about Fig. 4 and the "principal conclusions" of the Busby et al paper. They are all wrong.

Now, I can present here data on "age estimates of sub-haplogroups R-S21 vs. R-S116", based on a much better approach. This is an important question, because it likely sets a good DNA-related time estimate for Bell Beaker movements from Iberia up North to the continental Europe.

I think, Dienekes, that since you presented here negative "side of the coin" you with your fairness would like to see its positive side. Aren't you?

Thank you.

Anatole Klyosov

Monday, August 29, 2011 3:21:00 PM

I do agree with Anatole on how Busby et al picked certain STR's that didn't match very well with what they said their intent was, and was the main thrust of their argument. I still agree with Busby that STR selection is important, not just the limited individual STR's and number of STR's they chose.

I think some have gone a bit overboard in declaring the death of STR variance and TRMCA estimations. As Mark Twain would say,

I don't like how Busby jacked around with removing data from the samples. They didn't like Barlaresque's conclusions based on R-M269 which shows Anatole (Turkey) with significantly higher variance. They re-sample and deselect from Irish and then just plain throw away Turkey to get the outcomes they are looking for.

Let's go over to the Busby thread that Razyn pointed out if you want to debate Busby.

Quote from: Mikewww

Are you talking about the Supplementary Information where they refer to Figure S2 panel B?

Look at a little closer and read the notes.

Quote from: Busby

Figure S2: Reanalysis of Balaresque et al R-M269 Samples. The first panel, A, shows the significant correlation between longitude and mean (observed) variance found in their original dataset.The second panel, B, shows the same data, including the Irish group but without the three Turkish (TK) populations. Removing the Turkish samples can be justified on the grounds that, in our dataset c.90% of Anatolian samples were un-derived at SNP S127, whereas the majority of individuals in European populations were derived at this SNP

No, I’m referring to figure-2b in the main study, the one that shows a map with the frequency of RxS127 on the left and a graph of bootstrap variance relative to a normalized longitude on the right. ....Well it wasn’t quite like that with the Irish, here is their explanation as to what they did:

Quote from: Busby et al(2011)

..... However, we note that the positive correlation between longitude and variance is still present after removing only the Irish and retaining the Balaresque et al Turkish populations. If we replace the variance calculated by Balaresque et al with that calculated from our repetitions, then the correlation is no longer significant, independent of whether or not we remove the Turkish samples (Figure S2).

My understanding is that "bootstrapped" means "resampled" so the Figure 2 charts a, b and c on the right from the main body of the paper are based on Busby's resampling, not on "observed" variance.

The Supplementary Information is intended to support the primary body of the paper and show the details. The Supplementary Information Figure S2 panels A through E go through the gyrations of the actual observed variance calculations.

It is critical to Busby that he handles the Turkish and particularly the Irish samples differently to get what I guess are the appropriate outcomes.

Supplementrary S2 ( http://rspb.royalsocietypublishing.org/content/early/2011/08/18/rspb.2011.1044/suppl/DC1 ) Panel A is just a straight look at "observed variance" from what the data is. This is Barlaresque's view and is the straw-man proposal that Busby appears to be after. The Irish sample shows the lowest variance and the three Turkish samples show the highest (with most of the continent in between) - resulting in the clear east/west cline.

Panel B is where the Turkish samples are thrown out because "Removing the Turkish samples can be justified on the grounds that, in our dataset c.90% of Anatolian samples were un-derived at SNP S127." The east/cline diminishes signficantly but is still there slightly because of the low diversity in the Irish sample. I don't see how it is valid to throw out the Turkish sample because it contains a lot of L11(S127). Do multiple analyses at multiple levels of the phylogeny if that is what you want, but why throw out a location altogether?

Panel C throws out the Irish sample but leaves the Turkish in. The east/west cline is present even without the low diversity Irish sample.

Panel D introduces something new. This is where the "bootstrapping" comes in. The observed Irish variance is now replaced by the resampled variance. The Irish diversity is now higher than all of the European samples and is now as high as the highest Turkish sample. There is still a slight east/west cline, but apparently this is deemed as "insignificant". I'm okay with that determination, I just don't trust the resampling.

Quote from: Busby et al(2011)

Testing the variance calculations from the Irish population... We note, however, that 17-STR haplotypes, including the 9 STRs used in Balaresque et al’s analysis, are available for 681 Irish R-M269 derived individuals in Moore et al (3), which is, in fact, the study which Balaresque et al use to estimate R-M269 frequency in Ireland. A subset of the Moore et al samples were re-analysed in the current study for SNPs downstream of R-M269, and the original haplotype data are used here to calculate variance.[/b] To test if the Ysearch haplotypes were representative of the Irish R-M269 in Moore et al, we independently re-sampled the Moore et al dataset 10,000 times, selecting sub-samples of 75 haplotypes from which we estimated the variance using the same 9 STRs used in the Balaresque et al paper. The median variance of these 10,000 repetitions was 0.354 with a 95% CI of (0.285-0.432). The lowest variance value out of the 10,000 samples was 0.242, which is still higher than the figure observed in the Balaresque et al Ysearch sample (0.208). We therefore believe our estimate of Irish R-M269 variance to be a more robust representation of the true variance than that estimated by Balaresque et al. However, we note that the positive correlation between longitude and variance is still present after removing only the Irish and retaining the Balaresque et al Turkish populations. If we replace the variance calculated by Balaresque et al with that calculated from our repetitions, then the correlation is no longer significant, independent of whether or not we remove the Turkish samples (Figure S2).

Even though there are thousands of high resolution SNP Irish samples available, Busby resamples 75 at a time for only 9 STRs. He uses Moore's data. I don't understand why they had to use the Moore "A Y-Chromosome Signature of Hegemony in Gaelic Ireland," 2006. L11(S127) wasn't even tested for. Even if they don't like Ysearch, YHRD or whatever, there is plenty of deeply tested data from Ireland.

...anyway, after the "bootstrapping", Busby's "robust" representation of Irish R-M269 now zooms the Irish to the top of STR variance, jumping by Britain and Continental Europe and pulling even with the highest Turkish sample. Does this sound right? Should the hypothesis be that two different forms of R-M269 survived a bottleneck and exploded to take over Europe from both directions - one from Ireland and the other from Turkey?

I just don't like comparing data across regions with different treatments per region. If "robust bootstrapping" is good for Ireland then it should be done for continent and Turkey too.

As noted above, Busyby states this condition on his analysis.

Quote from: Busby

variance calculated by Balaresque et al with that calculated from our repetitions

However, this particular quote didn't complete the condition. The condition is he replaced the variance calculations for Ireland only, and with a different data set to boot. I'm not a brilliant statistician but I know that apples and oranges don't add up.

um.... I fear this is a problem with almost any of these Pan-European studies. They are piecing together data from multiple sources, but I hope that at least they will be consistent within their respective studies in their treatment of the data.