LESSON 09: atDNA Matches

What
are atDNA matches anyway?
When we have matching DNA what does that really mean? Simply, matches are individuals whose segments (stretches) of DNA match us along a given chromosome. These matches are where we have a
half match, otherwise referred to as a Half
Identical Region (HIR, pronounced "her"),where for every SNP or base pair at least one of the two alleles
matches one of our match's alleles for the length of the matching segment. An allele is simply the "A," "T," "C,"
or "G." (Review Lesson 5 if you are confused.)

Here
is a very short part of a segment on Chromosome 16 showing the pair
of alleles for each SNP:

In
this example John and Mary share a HIR and John and Pete share a HIR
but Mary and Pete do not, because in the highlighted SNP they are not
even half identical. (In reality a shared segment would include
hundreds if not thousands of SNPs.) In this simplified example "John and Mary" AND "John and Pete" will share a common ancestor but it won't be the same common ancestor because Mary and Pete do not match. We have essentially phased the data (determined the two sides: mom and dad). John and Mary in our example are cousins: they have a shared ancestor on John's maternal side and John and Pete share a common ancestor on John's paternal side. Keeping track of who matches whom on which segments is a process called chromosome mapping (See resources in Lesson 11 for more information on chromosome mapping.)

A
segment is considered "Identical By Descent" or "IBD"
when the length either in cM or base pairs is long enough to denote
that two individuals are "likely" to be descended from a common ancestor from whom they each inherited the same segment. When comparing two individuals who share a common grandparent (first cousins) the matching
segments will only represent one side of the segment (one allele in
each pair).
The shared grandparent (let's say maternal) represents the shared
side, and the other half of the segment is inherited from one of the paternal grandparents (and is not shared).

Each
of the three companies: Ancestry, FTDNA and 23andMe have different
criteria for determining matches.
This is why you may have a match at one company and not have a match
with the same person at another. Each company tries to include the
highest level of true matches while excluding many false matches.
This is based on a statistical analysis and somewhat of a cost
benefit ratio (the benefit of including false matches versus the risk of losing real matches).

The following represents the match criteria by Company:

FTDNA states
in their FAQ:

"The Family
Finder program
declares a DNA
segment to be Identical
by Descent (IBD) if it contains at least 500 matching SNPs
(Single Nucleotide Polymorphism) in series. For the program to
consider two people a potential match, the largest matching DNA
segment between two people must be at least 5.5 centiMorgans
(cM) long. The program then uses additional matching segments to
confirm the relationship and to calculate the degree of relatedness.
Based on the extensive Family Finder database, it is rare for two
genuine genealogical cousins to have a largest shared segment of less
than 7 cM and one less than 6 cM is exceptional."

ANCESTRY as
reported by CeCe Moore

"The minimum threshold for matching is 5 megabase pairs. There
is no minimum SNP requirement. " These results are pseudo phased
thus allowing a smaller size to be relevant. (They do not include matches on the X.)

23andMe
as reported in the ISOGG Wiki requires the following thresholds

Autosomal: 700
SNPs, 5 cM

X (male vs
male): 200 SNPs, 1 cM

X (male vs
female): 600 SNPs, 6 cM

X
(female vs female): 1200 SNPs, 6 cM

In
spite of these criteria the matches at the lowest levels are still
subject to an estimated 40-60% false match rate sometimes referred to
as pseudo matches or "Identical by State" (IBS). These are
segments that are NOT from a common ancestor but just happen to be a HIR by random chance or because they represent a very old geographical shared ancestry. The smaller the segment the more likely this is
to be true. The corollary is still worth noting that 40-60% of these small matches will be IBD "Identical by Descent" and thereby TRUE matches. If you already have a known match with someone it is often helpful to look at segments (this can be done down to 1 cM at GEDMATCH and FTDNA). These small segments may give you clues as to how you might connect with other matches or if they represent minority admixture may be clues to ancient ancestry.

In
reality the vast majority of genetic genealogy is based on
statistics. Statistics is a major source of the misunderstandings and
frustrations for beginning genetic genealogists. When you are dealing
with thousands of ancestors or thousands of matches some of them are
going to fall into the tail ends of the familiar "bell curve."
So in spite of the fact that the vast majority of say brother to sister matches are going to fall at the 49% average shared DNA level,
there will be ones that share 40% and some that share nearly 60%---unusual
but not impossible.
So whenever looking at DNA remember my "unusual but not
impossible" maxim. Sometimes a match is a match and sometimes it
isn't. Statistical predictions are the roadways on which most of us
travel. But every now and then people go off-road and take a detour.
In spite of the science, that's as precise as we get. The more
transactions (DNA exchanges) or generations we look at the more extreme the ranges become.

Paddy Waldron states "On average and in theory, an unbiased coin toss produces a head 50% of the time and a tail 50% of the time. In a single trial just now, my coin produced a tail (i.e. 100% of my sample of size 1 resulted in tails). Similarly, on average and in theory, a pair of first cousins share 12.5% of their DNA. In your trial (also a sample of size 1), an observation of exactly 12.50000% is about as likely as the coin landing balanced on its edge. The single observation will be somewhere close to the average. A lot of those commenting above seem not to be distinguishing between the average outcome of many random experiments and the actual outcome of one single random experiment. Inheritance of autosomal DNA is just a random experiment, as recombination occurs randomly along the 22 autosomal chromosomes. A large sample (say two families of 10 children all testing, giving 10x10=100 first cousin matches) will produce a sample average much closer to the theoretical average than the samples of size 1 which are typical in genetic genealogy."

In that vain note:

Steve Mount states "The probability that fourth cousins share at least one IBD [identical by descent] segment is 77%, and the expected length of this segment is 10 cM." Now consider the next step. There is a 50% chance that that one shared segment will not be transmitted at all, but a 90% chance that if it is transmitted it will be just as big as it was (the same 10 cM.)"

Here's a look at each company's match reporting.

ANCESTRY does not show us the segments or the number of matching base pairs; they simply give us a list of matches and the anticipated relationship and margin of error. (Note: Names have been scrubbed, please click on image to show detail.) The predicted relationship is given, plus a range of relationship and a confidence level. User name is listed and contact can be made via Ancestry's message system.

Family Tree DNA FTDNA gives a nice summary including relationship range, suggested relationship, shared cM and longest block (which is the length of the longest segment). Matches are identified by name and contact email is given if available. Links are given to on-line trees and lists of surnames (not shown).

23andMe shows the relationship, percentage shared and number of segments as well as haplogroups and other information shared by your matches in an easy to read format. They pack lots of info into a small space. Most of the matches will be private unless they accept a share although a small portion are public.

As I have said before each company has its advantages and disadvantages. And in many circumstances a combination of all three is the best choice for those serious minded genetic genealogists who want to take full advantage of what each has to offer. Ancestry's simplicity and pseudo phasing and attached trees probably has the best chance of making genealogical connections. Power users may insist on "seeing" the matching segments and being able to do Chromosome Mapping. Some will like the value at 23andme and discount pricing for multiple kits ordered at the same time. Still others will like the chromosome matching tool at FTDNA that lets you see matching segments down to 1 cM or being able to combine atDNA with YDNA or mtDNA in one place.