Posted
by
Soulskill
on Tuesday October 29, 2013 @07:00PM
from the turns-out-several-million-people-married-their-cousins dept.

ananyo writes "Using data pulled from online genealogy sites, a renowned 'genome hacker' has constructed what is likely the biggest family tree ever assembled. The researcher and his team now plan to use the data — including a single uber-pedigree comprising 13 million individuals, which stretches back to the 15th century — to analyze the inheritance of complex genetic traits, such as longevity and facial features. In addition to providing the invitation list to what would be the world's largest family reunion, the work presented by computational biologist Yaniv Erlich at the American Society of Human Genetics annual meeting in Boston could provide a new tool for understanding the extent to which genes contribute to certain traits. The pedigrees have been made available to other researchers, but Erlich and his team at the Whitehead Institute in Cambridge, Massachusetts, have stripped the names from the data to protect privacy."

Wow, now everybody's a hacker these days. It started to go downhill with the whole "lifehacker" meme. perhaps I should be called "garbage hacker" instead of the prior preferred term, "sanitation engineer"

I think "garbage hacker" would be appropriate if, instead of taking it to the dump, you did something interesting with it... Like a large-scale model of Ballmer, for some zozobra-like action. You know, to commemorate his years of (dis)service to Microsoft.

Outside of the obviously slanderous term this is not "Science". The guy pulled data which people input on genealogy sites. There is lots of biases here, because everyone wants to be related to someone famous and/or of historical significance. I can't tell you how many of the people on these sites claim to be related to famous people like King Henry or Napolean (I have met many). Funny that so few claim to be related to Ben Franklin who fathered 10 times the amount of children they had.

Lisa Cannon-Albright, a geneticist at the University of Utah in Salt Lake City, urges caution when using self-reported genealogical data. She has worked extensively with a large Utah genealogy database that is linked to some medical information. “Everyone wants to trace their family back to royalty,” she says. “For these giant pedigrees, we just don’t believe them beyond a certain date.” Cannon-Albright says that she cuts off her data at the year 1500.

It's also kind of pointless because as TFA points out, even if you assume the unlikely chance that the data is accurate what the hell good does it do?

For now, it is unclear how the huge pedigrees generated by Erlich and his team will be useful. Some scientists at the meeting expressed enthusiasm for the project, but were hard-pressed to come up with a specific experiment using the data.

I am sorry, but it still proves nothing but rather a hear say. Until you can show any hard evidence (such as DNA proof), it could still be false regarding the GP reasoning -- false claim to be related to a historically important person.

If you look at the math, chances are almost everyone is related to royalty, if you go back far enough. Do you have any idea how many descendants my 35 times great grandfather has? Especially remembering that 75 years ago and earlier everyone had big families and lots of kids?

The online family tress they're using as source material are notoriously unreliable. They don't include sources, errors are copied from tree to tree by name collectors, and many links are often incorrect. I can't believe they think they can draw a conclusion from any of it. Respectable genealogists would laugh at this endeavor.

Even if all the information were a 100% accurate representation of the actual records and all links were correct, the original records likely contain numerous errors or important omissions; to take the most obvious point, there is likely to be almost no way to verify whether children were legitimate or not. So its usefulness for genetic study seems doubtful to me as many generations later I suspect those sort of effects are difficult to pick up or isolate properly in living people's genes.

What's worse, in some historical periods it would not have been uncommon for some children to be biologically unrelated to either of their legal parents - e.g. lovechild of an affair the man had with a woman who was also sleeping with other men (but who claimed he was the father as he represented the best economic/social prospect of the possibilities), after which the man might take responsibility and raise the child as his own.

Errors is putting it kindly. People lie about paternity. They have skeletons they want firmly kept in closets. And they have no compunction about falsifying all the records. Genealogists were disliked for being nosy, and the entire field was slandered as something only weirdos could find interesting. There was a medical study done back in the 1940's that as an aside also could determine paternity. What they found was a whopping 10% of the babies were fathered by someone other than the husband. They k

If you're using a maximum likelihood analysis, your model can allow for unreliable data. E.g. you could assign a 10% chance that the paternity is not as recorded. Then you would have probability calculations likeP(child inherited gene from father)=0.9*P('father' (according to genealogy) had the gene)+0.1*P(random male in the population had the gene).

You can even make the 'false paternity rate' a parameter in your model, so the data itself will tell you what value is best. However, if the data is too unreliable, all that your maximum likelihood analysis will tell you is "we can't conclude anything from this data". (Assuming you correctly model the unreliability. If you don't, your analysis is liable to give false results.)

Maximum likelihood is not always computationally feasible, depending on the model you're trying to fit and how much data you have.