Genetic Structure Methodology

After i say this rule the first person who says the word "race" has to go home
That is not because it's a bad word, it is a very nice word with nice
etymological roots and a rightful place in the human mind. It simply is not what the thread is about.
------------------
This thread is about Genetic Structure Methodology, as illustrated in the recent article in Science (21 May 2004, vol 304 page 1160)

Genetic Structure of the Purebred Domestic Dog

http://www.akcchf.org/news/press/releases/2004/dogbreeds.pdf [Broken]

algorithms are modernday monsters. when an algorithm is created we do not know where it will lead. usually to nothing bad though, just to unexpected stuff. We are talking about a genetic structure algorithm.
At present it must be helped along by humans but in principle it could
be entirely programmed. give it 85 blood samples from 85 different breeds of dog, and the unaided computer builds a tree.

it isnt that good yet. It is still very rudimentary. the first tree (see figure 2) had only about 10 breeds of dog, plus a catch-all category
--------------------

with a new technology it is always tempting to point out lots of reasons why it cannot succeed or cannot generalize or will never be able to do this or that. But it's a basically poor idea to do that. Humans are ingenious and you just have to wait and see. Please go along with me in doing two contracdictory things:(1) recognize that a new algorithm is a potential monster that could take us where we dont want to go (2)dont be afraid, just try to see how it works.

the italicized word structure is the name of a computer program.
when most of us were children the names of computer programs were set in all-caps, these people use italics. It is less jarring and makes for a nicer-looking page. In this article the italicized word appears many time.
---------------------

It is always tempting to be a passive consumer of science. Let "them" do the hard stuff and tell me the results. In this case, strangely enough, the method fascinates me more than the results. The topic, Purebred Domestic Dogs, is both solemn and a bit ludicrous. It is fun but I dont have any urge to take it seriously. the method though.

the next quote is from page 1162

"We first used standard neighbor-joining methods to build a majority-rule consensus tree of breeds (Fig. 2), with distances calculated using the chord distance measure (26)....The tree was rooted using wolf samples."

This sentence has a pleasant clanking sound of contented jargon.
One wants immediately to know what is the "chord distance measure" was it perchance invented by Mr. Cavalli-Sforza? Yes it was, seemingly in 1967.
The revered and still-active gentleman was then 45 years old. It could be his most famous contribution to Dog Genetics.

"...The tree was rooted using wolf samples." This sentence is too beautiful for comment. Some poet writing for the New Yorker might steal it.
--------------

there is a very nice passage in the middle of page 1161. I will try to mouse it so I dont have to type it in myself.

dont you get the feeling that we ought to understand the main steps in the algorithm?
(those like myself who dont yet)

------exerpts----
Here, we show that microsatellite typing of a diverse collection of 85 breeds, combined with phylogenetic analysis and modern genetic clustering methods (11, 12), allows the definition of related groups of breeds and that genetic relatedness among breeds often correlates with morphological similarity and shared geographic origin.

Strong genetic differentiation among dog breeds suggests that breed membership could be determined from individual dog genotypes (9). To test this hypothesis, we first applied a Bayesian model–based clustering algorithm, implemented in the program structure (11, 12, 21), to the microsatellite data. The algorithm attempts to identify genetically distinct subpopulations on the basis of patterns of allele frequencies. We applied structure to overlapping subsets of 20 to 22 breeds at a time (22) and observed that most breeds formed distinct clusters consisting solely of all the dogs from that breed (Fig. 1A). Dogs in only four breeds failed to consistently cluster with others of the same breed: Perro de Presa Canario, German Shorthaired Pointer, Australian Shepherd, and Chihuahua.

The tree was rooted using wolf samples. The deepest split in the tree separated four Asian spitz-type breeds, and within this branch the Shar-Pei split first, followed by the Shiba Inu, with the Akita and Chow Chow grouping together. The second split separated the Basenji, an ancient African breed. The third split separated two Arctic spitz-type breeds, the Alaskan Malamute and Siberian Husky, and the fourth split separated two Middle Eastern sight hounds, the Afghan and Saluki, from the remaining breeds. The first four splits exceeded the majorityrule criterion, appearing in more than half of the bootstrap replicates.

-----end quotes-----

these interesting-sounding terms are probably good ones to understand (in some cases almost self-explanatory from context, but wouldnt hurt to be more explict)

with a new technology it is always tempting to point out lots of reasons why it cannot succeed or cannot generalize or will never be able to do this or that. But it's a basically poor idea to do that. Humans are ingenious and you just have to wait and see. Please go along with me in doing two contracdictory things:(1) recognize that a new algorithm is a potential monster that could take us where we dont want to go (2)dont be afraid, just try to see how it works.

Er, marcus were you aware that structure has been used to analyse human genes?

In fact, that "Rosenberg et al" paper which was referenced several times in the "Is there a scientific basis for 'human races'?" is a paper giving the results of just such an analysis. Here's hissquad's post, bringing it to our attention.

Also, the two papers - both with Cavalli-Sfroza among the authors - which iansmith provided links to (well, to the abstracts) seem to have used similar techniques to the structure algorithm. Of, and btw, C-S et al in their 1994 book include a) a detailed description of how to construct trees (yes, they talk about the difference between 'rooted' and 'rootless' trees), and b) many trees for Homo sap., based on their analyses of genetic information available to them at the time. The first paper also illustrates well advances in the past decade or more, e.g. "For the first time, with biparentally transmitted markers, the microsatellite tree also shows that the San are the first branch of the human tree before the branch leading to all other Africans"