People get their DNA tests back only to discover what seem to be small fractions of an ancestral region which, perhaps, they did not expect.

When to take these seriously?

There is no magic guideline, but tiny amounts that appear neither in multiple tests for one person, nor in several fairly closely related family members, are likely to be phantom.

This is especially true if they do not match the paper trail, though it’s worth noting that paper trails can be wrong, or you could have inherited an ancestral component by sheer luck that goes back farther than your records.

Conversely, if you see small amounts of a population that appear regularly among related people, or across multiple analyses in one person – even in small amounts – these may be real.

As an example, my LivingDNA results give me nearly 3% from the Caucasus.

This seems high as compared to paper trail, but consider that Ancestry’s DNA service also gives me 1% Caucasus. This does tend to suggest possible ancestry from the region, almost certainly stemming from a line of my father’s ancestors who were in Turkey/Syria in the late 18th/early 19th century.

As distinct from strict paper genealogy, full family history often uncovers rumours, tidbits of gossip, controversy, cover-ups, and other things you can’t find in records.

Sometimes a dead infant was hastily forgotten. Just as often, the child of an unwed teenage girl was raised by the mother’s parents, the grandparents acting in appearance and on paper as the biological parents.

This is the sort of thing DNA testing or rigorous research can sometimes clarify, but there are lot of other things DNA can’t point to, and it’s always useful to interview as many relatives as you can, because even if they’re slightly off about some things, they may be your only possible source for some information.

Take, for example, my great-granduncle, Major George Malcolm Smith, M.C. The first Rhodes Scholar from Western Canada, Uncle George, as history records, led a parade of the Princess Patricia’s Canadian Light Infantry at Steenvoorde, France, 100 years ago, on the 18th of June 1916.

In the autumn of that bloody year, as history also records, George was in the Battle of the Somme, at Courcelettes, where he ably covered the retreat of an entire regiment with a Lewis gun, earning him the Military Cross.

Later, he was Dean of Arts and Sciences at the University of Alberta, and served as a Major in the Second World War. Part of this time he spent in Ottawa on a project which I don’t believe has ever been declassified.

So, history does not record everything. Besides the mystery of what he was involved in while in Ottawa – a slight clue is offered in that later in the First World War, he worked in the British intelligence corps – George held another secret, which was that he was gay. Unfortunately, some of his relatives cut him out of their lives for this, and this fact would not have made it this far in history without human witnesses.

So please, when you’re putting a family tree together, consider that there are many nuances and details that do not appear in any official record, but which still help give a fuller account of our deceased ancestors’ lives.

In my first post, I introduced the concept of Bermuda Triangles or black holes of genealogy: areas where information was harder to come by. I already covered Ireland, so I won’t dwell on that.

One area I have had trouble with is the early northern Vermont area in the later 18th century, because this was an area contested between the British and the Americans, and because I can so rarely get there from where I am to do research.

Absence of records to begin with is a problem there. In other areas, though, records may have existed but have been been destroyed in fires or war. This is quite common in areas where US Civil War battles took place.

In still other areas, records may still exist but privacy laws, poor organization and cataloguing, language barriers, lack of political will to make records available, or cost may be factors in making genealogically valuable information hard to get.

Specifically in Canada, Ontario and Quebec have fairly strict laws covering vital statistics that make some aspects of recent genealogy a little tougher. On the other hand, Quebec has some of the best digitized church records in the form of the Drouin Collection, while Ontario has an immensely dedicated team of cemetery and vital record transcribers, with many records now available at Familysearch.org.

My own province of Manitoba has all genealogically available records indexed by the Vital Statistics Agency, and available genealogical registrations are a mere $12 Canadian. In contrast, Alberta makes it tougher to order vital records. First, Alberta offers no index of these records, so you have to know exactly what you are searching for to start with. The charges are high for those from out-of-province, too.

British Columbia used to have a similarly awkward and expensive system, until they moved the Archives to the control of the Royal BC Museum and many interesting and valuable records are now digitized and freely available, as I wrote about last time. Maybe it’s just me, but donating money to an institution that makes records free feels better than paying the same amount in fees.

Don’t ignore small local museums and archives, either, whether in BC or elsewhere. I found a treasure trove on one of my family lines from the Summerland Museum and Heritage Society, for instance.

I was also pleasantly surprised, then, when, on requesting the death registrations of my grandparents from the Wyoming State Archives, I discovered not only that they were provided free by a prompt, polite, and helpful volunteer, but also that she included a PDF of a newspaper clipping with coverage of the accident that killed them. That’s the sort of service that I think merits a donation.

Great news! The British Columbia Archives, since the Royal BC Museum took up the mantle, have digitized a whole bunch of vital records that formerly required a $50 fee to order. Go to town – I know I have, and will.

Earlier, I linked to an article showing that because DNA is not handed down in even chunks from all your ancestors, you won’t match most of the people you’re related to on paper beyond about fourth cousins.

False Matches

There’s another aspect to DNA testing which makes it not entirely reliable, which is that you almost certainly have a number of false matches in the database: people who seem to share DNA with you but who are not related to you in any genealogical sense.

They may be related to you in terms of coming from a common population, or they may not be related to you at all. Worse still, they may be related to you on paper, but the genetic link that appears to correspond to your common ancestors may be no link at all.

I don’t think I’m able to express this better than Dr. Ann Turner did in her wonderful post in the Journal of Genetic Genealogy, so I’ll send you there. But what’s worth noting, in brief, is that segment lengths of autosomal DNA, the DNA you inherit from all your ancestors, cannot be set with any considerable confidence without phasing, which is working out which letter of each two-part base pair of DNA comes from which parent.

The Math

John Walden and Dr. Tim Janzen have provided analysis of a goodly sized sample set of 9,000 haplotypes, which give probabilities that a match in a database survives phasing on both sides and therefore is likely IBD, identical by descent (meaning you share the match through a common ancestor), as opposed to IBS (identical by state, as Ann described in the article I linked to above).

There is always a trade-off in statistics between specificity – including all relevant examples – and confidence, that is, being sure the samples we have caught are relevant.

Now testing companies have a problem. They all have to set a threshold for genetic matches somewhere. Should they set the threshold high and leave out real cousins, or set it low, and include many non-cousins?

{EDIT: Corrected values and graph added April 11, 2016.}
I’ve done a very simple logistic regression analysis on the data. What this S-curve in fact shows is that there is a steep climb from the range of about 5 centimorgans (cM; the centimorgan is a unit of the probability of recombination occurring in DNA), where the probability of a match surviving phasing is just over 13%, to 9 cM, where the probability is around 78%. For stats wonks, this regression was done in R.

In brief, what this means is the testing companies, if they are using unphased data, have to include a whole bunch of matches in that steep climb. Many and perhaps most of your matches will in fact fall into the “grey zone.”

Unfortunately, with endogamous populations or those that underwent a founder effect (and the 23andme database has strong representation from at least two such populations: people with early New England descent, and Ashkenazi Jews), the majority of one’s matches are likely to be false, because 7.0 cM fits with a less-than-even probability of a match being IBD. When you add to this that 23andme has a cap on Relative Finder matches of 1,500, and that they populate one’s Relative Finder list by total centimorgans shared rather than by long blocks, this makes 23andme less desirable as a genealogical tool for those with significant early American, Ashkenazi, or otherwise bottlenecked ancestry (such as French Canadians).

Ancestry DNA is a little trickier to compare. At first I couldn’t find anything on their methodology, but on a tip, I was pointed to this.

The upshot is, they phase short segments, so one will be less likely to come up with matches by chance, but may still have many ancient population-level matches.

They still do not have a chromosome browser, so it is impossible to verify that a putative match overlaps others who share that ancestral line.

—

Conclusions

However you slice it, though, a number of your matches will be identical by state and will not represent a genealogical ancestor. For a breakdown of how IBS could mean either identical by chance, or identical by population (particularly relevant to close-knit or bottlenecked populations), I refer you to a recent blog by Roberta Estes.

Last time, I wrote about a rather spectacular DNA success story involving me and two “new” second cousins once removed. (They’re both a couple of decades older than me but are new to the family and their identity.)

Today I was going to write a long post explaining why you won’t DNA-match most of your relations beyond about fourth cousins, but this post from DNA.Land makes the same point better than I could, so I’ll save my energy for a future blog entry on a related problem: illusory DNA matches.