Search This Blog

Talk To Me on Fold3!

Sunday, April 19, 2015

When I talked to the website owner of Khazaria.com and asked him about the testing of Khazar skeletons, I received the following reply, in part:

Why are we still talking about the Khazars? They aren't involved inour ancestry at all and archaeologists and historians say it may bedifficult to distinguish Khazars proper from the other peoples of Khazaria,plus I'm not aware of anybody who has tested Khazar skeletons or plans to, butyou are welcome to ask around now that Russians have successfully testedmany populations like the Yamnaya and the Mal'ta.Based on the latest evidence I would say the Khazars are Volga Finnicintermixed with East-Central Asian Turks and other assorted peoples, andtheir Turkic element is the same one found in other Turks and Mongoliansaround Eurasia, a particular affinity never found in Ashkenazim....In lieu of ancient DNA, modern populationshave proven to be good proxies to determine ethnicity. Did you see my recentarticle "The Chinese Lady who Joined the AshkenazicPeople"?http://issuu.com/jewishtimesasia/docs/mar2015/19Some Ashkenazim are also descended from a Korean-related people, from amore recent Asian-Ashkenazic marriage.

Also by the way, I compare Dr. Himladevi "Himla" Soodyall to "Dr." Eran Elhaik. I don't know what agenda "Dr." Soodyall has, although I can ascertain that she attempted to delegitimize the Lemba as much as "Dr." Elhaik attempted to delegitimize Ashkenazi Jews.PS My dad's Ancestry atDNA in even Analysis 2.0* does, in fact, show a very-slight amount of Middle Eastern atDNA. It also shows a tiny bit of East Asian, Melanesian, Scandinavian, and Finnish/Northwest Russian atDNA. The Melanesian atDNA is probably related to the East Asian atDNA, and Scandinavian atDNA to the Finnish/Northwest Russian atDNA.

*"We create estimates for your genetic ethnicity by comparing your DNA to the DNA of other people who are native to a region. The AncestryDNA reference panel (version 2.0) contains 3,000 DNA samples from people in 26 global regions."

Regional Polygon Construction

As described above, we divide the globe into 26 overlapping geographic regions. Each region represents a population with a somewhat distinct genetic profile. Where possible, we use the known geographic locations of our samples to guide the delineation of regional boundaries. Figure 3.6 shows an example of the information used to define regional polygons.

For a more-accurate panel, they should have 115-16 ("115.384615"). Also, the selection should not be "carefully selected as described". The selection needs to be as random as possible. This cannot be accepted:

Before using the reference set to estimate ethnicities of AncestryDNA customers, we perform several experiments to lend support to the quality of this new reference set. This involves testing the performance of our ethnicity estimation procedure on the reference set of samples. (See Section 4 below for details regarding the statistical method used for ethnicity estimation.)

First, we use the new panel to do a leave-one-out analysis. In this experiment, we remove one sample from the reference panel and then use the remaining panel to estimate the ethnicity of the sample that has been removed. We repeat this process for every sample in the panel and then look at the average predicted ethnicity for each region in the set. Figure 3.4 shows the results of this experiment as a box plot.

Figure 3.4: Leave-one-out analysis of the V2 reference panel. Here we plot the results of an experiment in which each sample is removed from the reference set one-by-one and its ethnicity is estimated using the remaining panel samples. Each bar represents the average correctly predicted ethnicity for all samples from a given region. It is clear from this graph that for the majority of samples in each region, we predict at least 80% of the genetic ethnicity to be from the correct region. However, there are exceptions. In particular, our average prediction accuracy for samples from Great Britain, Western Europe, Iberian Peninsula, and Mali are not quite as high. There are many factors affecting the accuracy of these numbers, most importantly the number of reference samples in the panel for each region and the genetic distinctness of each region.

The purpose of this analysis is twofold. First, reference panel samples with poor performance in the leave-one-out analysis were removed. This included samples from individuals whose leave-one-out ethnicity did not represent their ethnic group of origin. (See for instance, Figure 3.5) Second, the leave-one-out plots allow us to define population boundaries and demonstrate our ability to accurately estimate the ethnicities of our reference panel samples using our method (see next section).

Figure 3.5: Removing Reference Panel Candidates. Leave-one-out estimation for a Reference Panel Candidate with 8 terminal ancestors from the Ivory Coast and Ghana region. While this sample was initially included as a candidate of the reference panel for the Ivory Coast/Ghana region, the sample’s leave-one-out ethnicity estimation reveals primarily Benin/Togo ancestry. As a result, this sample was removed from the reference panel.

There are two sources of error that limit generalizability: sampling error (chance variation) and sample bias (constant error) which results from inadequate research design. Sampling error (but not sample bias) can be taken into account using statistics.

Probability samples are representative of the population. They permit generalization to the population from which they are drawn. There are two types of probability samples: Random and stratified.

Random - each individual in the population has an equal chance of being selected for the sample.Stratified - a miniature representation of the larger population with regard to proportions within selected strata (e.g., gender, education, socioeconomic level). Individuals are randomly selected within strata.

A table of random numbers or the random number function in Excel can be used to select a random sample from a population.

If a sample is, thus, "poor", it should be put in an "Indeterminable" or a "Poor Sample" category. Some would argue, "Well, what about other studies that don't have very-balanced numbers"? Given that numerous studies on Ashkenazi Jews, Lemba Jews, and other groups have been done overtime—and most have shown similar or equal results—the studies balance the numbers at least somewhat in the end. Therefore, the argument about "other studies that don't have very-balanced numbers" is moot at this point.

** Stratified Sampling – This technique divides the population into meaningful homogenous or similar groups based on a certain characteristic (e.g., gender, race, socioeconomic status) and then selects a simple random sample from each group. [For example, if you were interested in the affects of student motivation on academic achievement, particularly by grade level, you would divide the population into their respective grade levels and then randomly select an equal number of 9th, 10th, 11th, and 12th graders.]