Syndicate

You are here

The odds of knowing your cousins: 23andme Part 1

Bizarrely, Jonathan Zittrain turns out to be my cousin -- which is odd because I have known him for some time and he is also very active in the online civil rights world. How we came to learn this will be the first of my postings on the future of DNA sequencing and the company 23andMe.

23andMe is one of a small crop of personal genomics companies. For a cash fee (ranging from $400 to $1000, but dropping with regularity) you get a kit to send in a DNA sample. They can't sequence your genome for that amount today, but they can read around 600,000 "single-nucleotide polymorphisms" (SNPs) which are single-letter locations in the genome that are known to vary among different people, and the subject of various research about disease. 23andMe began hoping to let their customers know about how their own DNA predicted their risk for a variety of different diseases and traits. The result is a collection of information -- some of which will just make you worry (or breathe more easily) and some of which is actually useful. However, the company's second-order goal is the real money-maker. They hope to get the sequenced people to fill out surveys and participate in studies. For example, the more people fill out their weight in surveys, the more likely they might notice, "Hey, all the fat people have this SNP, and the thin people have that SNP, maybe we've found something."

However, recently they added a new feature called "Relative Finder." With Relative Finder, they will compare your DNA with all the other customers, and see if they can find long identical stretches which are very likely to have come from a common ancestor. The more of this they find, the more closely related two people are. All of us are related, often closer than we think, but this technique, in theory, can identify closer relatives like 1st through 4th cousins. (It gets a bit noisy after this.)

Relative Finder shows you a display listing all the people you are related to in their database, and for some people, it turns out to be a lot. You don't see the name of the person but you can send them an E-mail, and if they agree and respond, you can talk, or even compare your genomes to see where you have matching DNA.

For me it showed one third cousin, and about a dozen 4th cousins. Many people don't get many relatives that close. A third cousin, if you were wondering, is somebody who shares a great-great-grandparent with you, or more typically a pair of them. It means that your grandparents and their grandparents were "1st" cousins (ordinary cousins.) Most people don't have much contact with 3rd cousins or care much to. It's not a very close relationship.

However, I was greatly shocked to see the response that this mystery cousin was Jonathan Zittrain. Jonathan and I are not close friends, more appropriately we might be called friendly colleagues in the cyberlaw field, he being a founder of the Berkman Center and I being at the EFF. But we had seen one another a few times in the prior month, and both lectured recently at the new Singularity University, so we are not distant acquaintances either. Still, it was rather shocking to see this result. I was curious to try to figure out what the odds of it are.

I've tried to reach about a dozen cousins in this system, and only 4 have responded. The 2nd turned out to be Jonathan's mother, whose account he handles. (This results in some more curious conclusions I will detail later.) The 4th came last week, and amazingly it was another friend. My newly discovered cousin Asya is also not a close friend but somebody I have known entirely in social circles for over 15 years, knowing each other well enough to have attended parties at each other's houses and so on. What can the odds of this be?

There is one important detail to add to this. My grandmother was an Ashkenazi Jew out of Vitebsk (Belarus), and both cousin JZ and cousin Asya are more fully of Ashkenazi ancestry from Eastern Europe. If you have the Jewish DNA on 23AndMe, it finds a lot of cousins for you -- around 800 to 1,000 from their database of around 35,000 customers. Non-Jews tend to match only about 100 to 200. Quite simply, the Ashkenazi community was tight knit, with a fair bit of inbreeding. Thanks to this, we share more DNA with others in our group than average people do, and just about everybody is some level of cousin -- the only question is how close. Still, only a few will share enough DNA to be ranked as 3rd cousins. They rank two people as third cousins when about 0.75% of their DNA shows as being "identical through descent." (For 2nd cousin it requires about 3%, and 12% for 1st cousins. Siblings share about 50% each being a different mix of their 4 grandparents. Each descent -- a more distant level of cousin involves two descents -- cuts the identical DNA in half.)

As a test, in fact, I did a DNA comparison with another friend of Jewish ancestry and we did indeed have a small match, branding us as cousins of unknown distance. Because 23andMe knows about the Ashkenazi inbreeding, they don't try to make claims on low-level matches among us. For cousin B, my .11% level of sharing would typically make us suspected 4th or 5th cousins. So it turns out that it is possible that JZ and Asya are more slightly distant cousins than the tool predicts.

You want more? The first cousin Asya contacted turned out to be somebody she knew! However, in this case she did not know him well, she knew him as a moderately well known blogger whose site she has participated in, and with whom she has had online conversations.

Working out the probabilities of this is difficult. However, in all the cases above, there is no physical connection between me and my cousins. My family left Vitebsk in the late 19th century, and moved to England and then Canada. JZ's family emmigrated earlier, and while there are Russian Jews in his family tree we did not find a link to the 2 surnames I know. Asya was born in Russia and is a modern immigrant. She has recent ancestors from near Vitebsk. cousin B was also born in Russia, but he does have recent ancestors from Vitebsk. So in all cases the migrations were disjoint. This is common for the diaspora from Eastern Europe. Indeed, the one place I probably don't have many relatives is Vitebsk itself. In 1941, the murderers of the Einsatzgruppen B made a base in Vitebsk, following the invading military and slaughtering all the Jews in that town, including, presumably, what remained there of the family of myself and these cousins. I never knew them, of course, but it is still disturbing to think about.

This diaspora is spread over Europe, Canada, Israel, the USA and a number of other countries. There are perhaps 600,000,000 or more people living in places the descendants of my (and JZ's and Asya's) g-g-g-grandparents migrated to. They did not settle in any one place, and nor do I live in any particular place of concentration, and JZ lives in Boston, not the Bay Area (where I, Asya and Cousin B all moved to.)

So the next question is, how many 3rd cousins does a typical person have? This is hard to answer, and it's an answer that changes quite a bit each generation. To come up with a general answer, you need to figure out how many successful children a typical couple has. Successful children are children who themselves have children and grandchildren. A study in Iceland came up with numbers that were quite low -- around 8 to 9 grandchildren per couple, or 3 per generation. The reason this seems low is that when I look in my family tree I see plenty of large families 2 generations back. My grandfather had 8 brothers. My Jewish grandmother was one of 11. Good Jewish families were expected to be fecund, and I also think there was a surge around the start of the 20th century when people suddenly found all their kids were thriving, when before you might have 12 and only see half of them thrive. I have not found a good demographer's report to come up with real numbers. In my immediate family though, I see that my mother's parents only had 5 genetic grandchildren who themselves bred, and my father's parents had 8 such grandchildren, more like the Icelandic numbers.

It's not a good assumption, but if you assume a fecundity number per generation of 3, you get around 600 3rd cousins. My general formula is (2f)^(c+1)/2, where f is breeding children per generation and c is the cousin number (3 for 3rd cousin.)
So with f=4 it's 2000 3rd cousins, f=5 yields 5,000. f=5 is quite a lot -- 25 successful grandchildren, each the start of their own line, per pair of grandparents.

Thus the issue: If 2000 cousins are truly and randomly spread among 600,000,000 people, then one in 300,000 people will be a cousin. If you have a circle of "friends" of 1,000 then the odds of even one of them being a cousin are 1 in 300 -- again if all things are evenly distributed. They probably aren't, but there are no obvious clumping factors, other than the suggestion that Jews are prevalent in the high-tech, higher-prosperity circles from which an abnormal number of my friends are drawn. Because I am only ancestrally Jewish, and was not raised that way, it is not the case that most of my friends are Jewish. Though it is true that Jews are well over-represented in my circle for whatever reason. However, even if 500 of my 3rd cousins are among the world's 12 million Jews, it's still 24,000 to one that a random Jew I meet is my 3rd cousin. And if you include partial Jews like me, it's even less likely.

But now we get to the part that seems the most crazy. If the odds of knowing a 3rd cousin at all are one in 300 (or even 1 in 100) what are the odds that the very first cousins encountered in the small database would be ones I knew? That seems through the roof. However, there is a big factor affecting this. Both JZ and myself are in 23andMe for a bunch of similar reasons. First of all, we are friends with Anne, a co-founder of 23andMe. I did some consulting for them and got in as a free customer. We are both early adopters with keen interest in privacy-related issues and genetics. As such we were also participants in the Beta release of the Relative Finder. That reduces the strangeness of JZ being my first contact, though not entirely. Cousin Asya, however, joined 23andMe for its original "medical-scan" purpose, as did the blogger she knew. Together it still seem quite bizarre, both by intuition and through math. Nobody else in the online forums of 23andMe has reported a story like this. (In fact, they mostly complain about how non-responsive the cousins they do try to contact in the database are. Most members seem to not want to contact distant cousins, particularly if they signed up primarily for medical reasons.)

As you might guess, Relative Finder has a number of rather big consequences to privacy and the future of genetics. 23andMe is not the only one doing this. Family Tree DNA, a company whose prime focus is genealogy, has begun a program of relative finding through DNA just recently. For years people have also been misled into thinking they can do genealogy through haplogroups which are groupings read from either the Mitochondrial DNA (which is passed down largely unchanged from mother to child) or the Y-chromosome DNA, which is passed down from father to son. Because these can be traced back through all-maternal (or all-paternal) lines into the distant past, and are easy to read, people get very excited about them, but in truth they reveal almost nothing about genealogy. For example, for your 5th cousins, only 1/2000th of them will share your maternal haplogroup because they are your 5th cousins. Far more of those who share it do so simply by chance, especially in some ethnic groups.

In part two, I write about some of those privacy questions, and the coming (in just a few years) exposure of almost all deep family secrets (adoptions, sperm donations, and children-by-infidelity). And I'll wonder if anything can be done about it, because it seems difficult to imagine what. And I'll explore some even more details of my relationship with Jonathan Zittrain.

A month or so after we learned this, Jonathan happened to sit on the couch next to mine in a lounge in Davos, Switzerland. (He was speaking, I was just party-crashing.) The coincidences keep coming. His interpretation? He wonders which of us will first ask the other for money. The bar was free so we don't yet know, but I'm happy to buy him a drink next time.

And is because in a room of 30 people there are 30 birthdays, not just one, so people get a wrong intuition on it. If there are 600M people and 3000 are your cousins the odds that any random person is your cousins is indeed 3000/600M.

the point of the birthday example is not that a specific person is looking to match their birthday, but whether you can find such a match amongst any two people in the room. Same idea here (where the room has 35,000 people). As with the birthday example, the person with the match feels as if they beat the odds, but in fact the odds were good that *someone* in the room would have that event.

I spent several years finding, and reaching out to, my 3rd and 4th cousins, using what I guess you'd call ordinary stalking techniques. (That is, no DNA yet). I haven't counted my 3rd and 4th cousins, but I have 28 second cousins, and I think my parents are about the same. I'd guess, then, that third cousins come in between 100 and 125. If I had to estimate fourth cousins -- I certainly haven't found them all -- I'd put the number closer to 400 than 600. With my fourth cousins (all of whom are descendants of one of sixteen couples) the shared ancestors were born between 1798 and 1815. 24 of the 32 common ancestors were born in the US, four each in Canada and Scotland. A couple of 4th cousins of mine -- also fourth cousins of each other -- know each other. I've never met a 3rd or 4th that I already knew.

None of the 32 were Jewish, and I don't know of any Holocaust deaths in my cousin set. The American Civil War thinned out the set to a considerable extent, however.

As noted, I worked out a formula based on an average "breeding children per generation" number it that number seems to vary quite widly.

When it comes to 3rd cousins (4 generations back) you have 8 pairs of great-great-grandparents. At 3 per generation that's 81 g-g-gkids per set, or 648 3rd cousins -- actually less because 108 of them are 2nd cousins or closer if you want to get technical.

I am assuming based on what I have learned that we had a bit of an explosion around 1900 with very large families where most of the children lived, and in the 20th century started moving to smaller families. But even with that assumption and much bushier numbers, it seems quite odd to find friends in the first cousins I am matched with.

In the case of my Jewish grandmother and her parents from Vitebsk, I know their names but nothing else. My mother never met her grandparents because they lived on another continent in the days before air travel. As such I will be curious if all the Jewish cousins 23andme finds for me will fill in those names. This gets harder because they anglicized their name upon moving to England and so I can't be sure what form it had over there, if it was written in the Roman alphabet at all!

I had a similar, but more dramatic experience at 23andme. When the program was in beta, I had contacted about a half-dozen putative 3rd cousins with no clear genealogical outcome other than that it was plausible that we were related. (Many of the folks on 23andme seem to know little about their ancestry).

When the feature emerged from beta I sent off some invites to four putative (and anonymous) third cousins. Next thing I knew, I received an email from one of my oldest and closest friends saying "Hi cousin!" We had first met each other in eighth grade in South Africa, where we both grew up. We both now live near each other in Newton (Boston suburb), having independently emigrated to the US in the 1980's. We had no clue that we might be related - our putative common ancestors likely lived in Lithuania in the mid-19th century, before their descendants emigrated to South Africa in the late 19th century.

I am full Ashkenazi Jewish - hence I have about 60 potential 3rd cousins on the 23andme system. My friend has an Ashkenazi mother and a non-Jewish father.

The coincidence is perhaps lessened some by the fact that I suggested to him that he sign up for 23andme. A fair portion of the South African Jewish population also originated in Lithuania. That we became friends (in a school of over 1,000 people) was fairly natural given we were both among the smartest kids in our grade (he's now a full Harvard Professor).

One thing that is common from people reporting 23andMe cousins is they do not have too much luck finding the common ancestor. Did you find them? Now with 3rd cousins, we are talking great-great-grandparents. You have 8 sets of great-great-grandparents. Everybody knows 2 of the surnames, and most people know 4 of them. I know the surnames of the two Jewish ones (though not the maiden names of the two great-great-grandmothers) and Jonathan has most of his names but we found nothing.

Finding nothing can mean many things -- unknown kids, adoptions, infidelities etc -- but by far the most likely explanation at 3rd cousin level is that the relationship is probably more distant. The more distant it is, the less improbable our bizarre connections are.

So many are wondering if the algorithm being used has some real issues, especially among those of us with the Ashkenazi DNA. You're another Litvak, perhaps it's even worse there.

60 3rd cousins seems far too large a number considering 23AndMe's database is only about 35,000, and probably has somewhat of a silicon valley skew.

3rd cousins are not particularly close relatives of course, and in the modern world many people don't have any direct knowledge of many such cousins, and they don't have a history of socializing with them. However, one thing that is true with the 3rd cousins marked in 23andMe is that you do share as much DNA with them as you would with a real 3rd cousin, even if they are more distant. So if it is the shared DNA which is of interest, you can treat them like 3rd cousins. Which is still pretty limited.

I tried a polish-Jewish friend on 23andMe and found no common DNA, so that cancels out the "we're all related" test but it may still be true for all the Jews in particular local areas, such as my case of Vitebsk.

I'm not sure how the 23andme algorithm can distinguish between 3rd cousins and 4th cousins multiple times over.

A really interesting computer science project would be to combine some partial family trees with this genetic information. I suspect it would be possible to fill out the gaps in the trees. For example, once my first cousin (on my paternal side) gets her results from 23andme, I should be able to determine whether my new-found putative 3rd cousin is related on my paternal side or maternal side.

Actually, if some distant ancestor had enough descendants on the system, it might be possible to recreate big chunks of that ancestor's genes. To the extent multiple HIR's overlap, you could combine them into longer sequences, just as was done when the genome was first sequenced.

You and your brothers are all different mixes of your parents (and thus your grandparents.) You are all related to this cousin through either mom or dad, or more to the point through one of your grandparents. All 3 of you got different mixes of your 4 grandparents and you obviously got more of the segments in question from that grandparent than your brothers did.

So these numbers make sense, but this is also why they both say "predicted 3rd cousin" and also "Could be 3rd to 4th" and also say that the odds of error can be high. However, with a lot of DNA matching (like 80cm) the theory is you have to be close, and could be even closer. Your brother B is of course just as close but he shares less, so he might get a prediction of more distant cousin but "could be closer."

But yes, they don't account well for the idea of being a 4th cousin through 4 different paths, which would create the same DNA sharing as a 3rd cousin would have. If it turns out, for example, than JZ's parents are both my 3rd cousins, then he is my 3rd cousin once removed, but has as much DNA shared as a 3rd cousin.

This is exactly what I was wondering about the main blog, the cousins did not find out or figure out a real common ancestor...thus the possibility of error reporting, or the possibility of really distant cousin calculations (say neolithic, could this explain jewish and non jewish connections in some cases?) For example, there are hispanics looking or claiming a crypto jewish history and are resulting in 5th cousins or greater to database verified jews, could this be a neolithic ancient connection?

Presumably more will be learned the more samples they get with real connections behind them. Though in theory a lot of that should have been learned from the icelandic studies. And full sequencing will tell more.

The issue with SNPs is that in order to spot two strands as identical by descent, they have to amass a large string of SNPs that could be the same. For each location, you have two values, and we don't know (unless they are very close I presume) which strand any given value is from.

So if you have say AG and the other person has CT, you can say for sure this section is not identical by descent. If you have AG and the other person has AT, you have something which might be identical, but might not be. If you both have the same pair it doesn't tell you much more, you only learn from the negatives.

If you get a long string with no negatives, it starts to be such that it is not likely to be from chance. Two unrelated people should not "not mismatch" -- yes two "nots" here for a long string.

But who knows, maybe there are patterns that stay together for various reasons and make fairly unrelated people keep them intact.

Your understanding of using Y-DNA testing for genealogy seems uniformed. People get excited about it because it is in fact quite useful. People do NOT use haplogroups to find relatives. They use the number of STR matches. Your haplogroup will tell you about your ancient ancestry (what one of your ancestors was doing ~20,000 years ago). Matching someone exactly for 40+ markers on Y-DNA means that you share a common ancestor within a few generations. The verdict is still out on what time frame an exact match means for mtDNA if you test HVR1, HVR2, and the coding region, but it is within 4,000 years, so within the time frame of Judaism.

I think the reason most people don't respond to the cousin finding feature of 23andMe is because most people conclude that their algorithm is incorrect, and that most of those people aren't your cousins.

How many is a few? As you note, for MTDNA exact matches can mean thousands of years, which is irrelevant for genealogy. Y-DNA should of course match surname in theory, but not in practice, but 23 and Me doesn't give you that level of Y-DNA matching, just haplogroups.

Brad, you have the math right.. "24,000 to one that a random Jew I meet is my 3rd cousin.", and also this "the Ashkenazi community was tight knit, with a fair bit of inbreeding. Thanks to this, we share more DNA with others in our group than average people do, and just about everybody is some level of cousin — the only question is how close."

The likely conclusion is that 23andme's prediction algorithm is overly aggresive in predicting the closeness of ashkenazi cousins -- not that you are against all odds so closely related to others that you know who also tested.

If you think that's wacky, then how about this: according to 23andMe, my mother-in-law turned out to be my "fourth cousin"! Well, not literally -- as a long-time genealogy buff, I know both of our family trees and family surnames going back over 150+ years (sometimes 220+ years in my case) and we do not actually share any known relatives, nor do our families even come from similar areas. I'm an Eastern European Ashkenazi mutt (Ukrainian/Polish/Moldovan) and she's a Mediterranean Sephardic mutt (Spanish/Sicilian/Italian/Greek/Turkish/Israeli) whose family fled the Inquisition and settled on an island in the middle of the Aegean Sea for the last 500 or so years. Our common ancestor(s) must have lived before Ashkenazim and Sephardim split off into separate ethnic and linguistic groups.

I mean, they do say that some men marry their mothers, and she and I do indeed share a lot of characteristics and personality traits, but really, this is really a bit much. :-)

Of course this means that my husband and I are cousins too -- and we rank as "fourth" cousins, not "fifth" as you might expect, because I very slightly match his (Ashkenazic) father on a few HIR's (half-identical regions), some of which my husband inherited, keeping our amount of shared DNA still fairly high. What's funny is that his father's ancestry is half northwestern Romanian and half northeastern Polish, and I have no known ancestry from those areas.

Takeaway point from all this: we Jews are apparently REALLY REALLY inbred, and not just the Ashkenazic ones. 23andMe is a fun service that really drives that point home, sometimes uncomfortably so.

If you had married in the old country, where in small communities marrying a 4th cousin was a pretty regular event. It's not too uncommon even today among people who don't leave the nest and go to other states and countries.

What makes these connections surprising is they are post-diaspora, that somehow medium distance relatives found one another even though the two forks of the family took entirely different journeys through different countries at different times.

Now it does seem clear that a lot of Jews came to Canada and the USA since Europe started showing a growing amount of hostility, culminating in the horror of WWII.

These things seem so unlikely that there is reason to suspect that the cousin algorithm is either too generous, or is fooled by inbreeding. I do share as much DNA with JZ as 3rd cousins should share, but it may be it's because we are, through different paths, a set of 4th cousins, 5th cousins and 6th cousins all at once, rather than one direct 4 generation family.

"somehow medium distance relatives found one another even though the two forks of the family took entirely different journeys through different countries at different times"

My husband and I met in college. He happened to see me in a comedy group show, nudged his friend who was sitting with him, and whispered to him if he knew me, could he introduce us? The friend did know me, I agreed to be set up on a (semi-)blind date, and we went out not too long afterwards. So I guess he saw something he liked, but whether it was thanks to that teensy bit of shared DNA that made me look appealing to him is unknown...

But before it sounds like those tiny bits of DNA were calling out to another from across the cosmos, let us not forget that Penn is a 30% Jewish school, and so the odds of meeting a nice Jewish guy there were pretty high, and the odds of that Jewish guy being distantly related to me are, as we have seen, more common than most people would believe.

Oddly enough, though, there is a recent study that says that couples who are third cousins are slightly more prolific (i.e. have more children together) than both more and less related couples: http://www.eurekalert.org/pub_releases/2008-02/dg-dlc020408.php

And keep in mind that the 23andMe Relative Finder algorithms were calibrated using Utah Mormons, most of whom have Northern European pedigrees, as its standard population set. Obviously, there is much less inbreeding and many fewer historical bottlenecks there than in the Jewish community, or other semi-closed groups like the Amish, Mennonites, Icelanders, Druze, etc. And speaking of bottlenecks, the post-Black Death anti-Jewish riots (in 14th Century Germany) and the the Bogdan Khmelnitsky pogroms (in 17th Century Ukraine) both did a real doozy on the number of surviving European Jewish lineages in those areas, along with many less known and less widepread events. The handful of survivors of such events may have been relatives, close or distant, who would have been forced to marry within the small remaining gene pool, which would probably increase the homozygosity of their offspring. Again, hello inbreeding!

Has anyone considered whether Ashkenazi Jews in general are more likely to join 23-and-me for medical testing purposes? After all, there are a higher rate of some genetic diseases in this population and genetic testing prior to marriage is rather more common than in the wider population. So, as a whole, this community might be overrepresented among those who choose to have genetic testing for medical reasons and therefore you're more likely to find relatives.

the ashkenazis are indeed overrepresented, but that doesn't seem to be as big of a factor as the pedigree collapse. the realization is that the ratio (number of "genetic 3rd cousins")/(number of actual 3rd cousins) is much higher than expectations. It's still very rare to find someone (even in a self-selected group) with a common great-great-grandparent; but for an inbred population, it's not that uncommon to find someone you know in that group with enough common DNA segments as if you had a common great-great-grandparent.

I don't think the Ashkenazi Jewish over-representation among customers at 23andMe is due to the medical tests 23andMe offers. Ashkenazi Jews are also very over-represented in FamilyTreeDNA.com's database, and FTDNA doesn't offer disease testing like 23andMe. And testing for the "Askenazic diseases" panel (Tay Sachs, Canavan, Gaucher's, Maple Syrup Urine Disease, etc.) is offered by Quest Diagnostics and other companies as a blood test that many people choose to take at their doctor's offices before having children. So I think the motivation behind the numbers at 23andMe is two-fold:

1) Many Ashkenazim only know their family histories for the last hundred years or so, and just in the US or Canada or wherever, and do not know their family histories within "the Old Country". They believe -- usually incorrectly! -- that all the old vital records in Poland or Hungary or wherever were destroyed during the Holocaust or under communism, or are inaccessible to laymen. So they start off their genealogical sleuthing with a DNA test, hoping to get enough clues to find their distant family.

From a genealogical perspective, this is frustratingly backwards; they should be using DNA as an adjunct or as a confirmation to standard research techniques, which can be much more fruitful than they realize. How many of them, for example, know about the online JRI-Poland vital records database, which is totally free? (http://www.jri-poland.org). How many of them know about JewishGen, the massive Jewish genealogy portal with all its mailing lists and databases and such? (http://www.jewishgen.org) How many know that all the Ellis Island immigration records are online, which can usually give them the name of their ancestor's specific hometown? (http://stevemorse.org/ellis2/ellisgold.html) From discussions I've had with matches at 23andMe, almost none of them know about these key genealogical research tools, and wanted to just jump in to genetic testing without putting in the regular legwork. Thankfully, the testees at FTDNA are usually far better informed.

2) These tests aren't cheap, and they are marketed almost exclusively in the English-speaking Western world. Most US Jews have a higher-than-average income level and live in the English-speaking Western world and therefore can make use of them in higher numbers than other ethnic groups.

And note that Sephardic, Romaniote, and Mizrahi Jews are all under-represented in the major genetic genealogy databases. The number of Persian Jews in the FTDNA database is almost nil, despite the number of Persian Jews living in the US.

You wrote: Next week, I will write about some of those privacy questions, and the coming (in just a few years) exposure of almost all deep family secrets (adoptions, sperm donations, and children-by-infidelity).

A few years?! Uh, I think you're a little late to the party. Check out the copious discussions of the delightfully inclusive term "NPE" -- "Non-Paternal Event" -- on the handful of genetic genealogy mailing lists and fora (such as http://archiver.rootsweb.ancestry.com/th/index/GENEALOGY-DNA , for example). Many project administrators at FamilyTreeDNA.com, which has a much more genealogically-knowledgeable userbase, have already had to deal with breaking the news to the occasional project member that their long-studied family lineage doesn't match up with their genetic realities. There have already been situations where brothers or first cousins have tested and stumbled across Mom's or Grandma's little secret the hard way.

On the other hand, both FamilyTreeDNA and 23andMe have been excellent tools for adoptees searching for their biological families. That's the flip-side.

All the posts seem to share a strong faith in 23nme's algorithm. Why so? 600,000 common SNPs is pretty crap coverage, and I certainly would not go on faith that anyone with a couple of shared SNPs was any kind of cousin. These old-timey arrays were not designed for this purpose and don't genotype the most informative loci for this sort of thing. Aren't you reading a bit too much into the data? Has anyone done a controlled experiment?

And may disprove these algorithms. However, the stretches are not a couple of shared SNPs.

JZ and I share 5 segments of DNA declared "Identical by descent" by 23andMe's algorithms. It is a total of 54 "centimorgans" covering 36 million base pairs and 6275 SNPs.

However, within a few years, with cheap full-genome sequencing available, there will be no doubt due to the limitations of the genechips.

Here is their FAQ on it:

Relative Finder looks for segments of DNA in pairs of individuals that have come from a common ancestor (i.e. the two segments are Identical By Descent, or IBD). It does this by taking advantage of the fact that if two people have "opposite homozygotes" at given SNP, where say one person is an AA at this SNP and the other person is a GG, then we know that at this SNP location it is not possible for these two people to have inherited either letter (allele) from a common ancestor. Stretches of DNA that contain hundreds of SNPs that lack opposite homozygotes, then, are evidence for IBD. We report only matches that have a longest segment of at least 7 cM (centiMorgans, a measure of genetic distance) and at least 700 SNPs. Additional segments need to be at least 5 cM and have at least 700 SNPs.

You know, a LOT of 19th and early 20th Century Jewish vital records did survive for Vitebsk: census records, birth records, marriage records, death records, school records, police/KGB records, voter records, etc.

Here is the latest list of every Jewish record that is known to have survived in the archives for Vitebsk city, Vitebsk Guberniya (an administrative subdivision), and Vitebsk Uezd (also an administrative subdivision), respectively:

A small amount of this information is already online at JewishGen -- try their "All-Belarus Database" here, which is totally free and has 150,000+ records online:
http://www.jewishgen.org/databases/belarus/

And the "Belarus-SIG" (Special Interest Group) has even more information:
http://www.jewishgen.org/belarus/

But a lot of Vitebsk's vital records information is still in the stacks of the various Belarus archives. So if you (or your newfound cousins) would like to do some genealogical digging, or hire a genealogist to do the digging for you, you all might happily surprised at just how much family information you would be able to find.

But did not have a great deal of luck. Their surnames got anglicized on arrival in England. My g-grandmother was Rebecca Pozin but may have uses Sarah or Sura as a Hebrew name, and the spellings probably changed. Same for the g-grandfather.

The large number of cousins among Azhkenazis also applies to those with colonial American, espeically Virginia, ancestrage where a limited pool existed for a long period of time without significant entry of new immigrants. I have 1056 matches and most are British with Carolina and Virginia ancestrage. Being adopted I do not know the details and the percentage Ashkenazi is highly variable based on the testing company but very low at < 1% with 23 and me.