Post navigation

First Cousin Match Simulations

Have you ever wondered if your match with your first cousin is “normal,” or what the range of normal is for a first cousin match? How would we know? And if your result doesn’t fall into the expected range, does that mean it’s wrong? Does gender make a difference?

If you haven’t wondered some version of these questions yet, you will eventually, don’t worry! Yep, the things that keep genetic genealogists awake at night…

Philip Gammon, our statistician friend who wrote the Match-Maker-Breaker tool for parental match phasing has continued to perform research. In his latest endeavor, he has created a tool that simulates the matching between individuals of a given relationship. Philip is planning to submit a paper describing the tool and its underlying model for academic publication, but he has agreed to give us a sneak peek. Thanks Philip!

In this example, Philip simulated matching between first cousins.

The data presented here is the result of 80,000 simulations:

Philip was interested in this particular outcome in order to understand why his father shared 1206 cM with a first cousin, and if that was an outlier, since it is not near the average produced from the Shared cM Project (2017 revision) coordinated by Blaine Bettinger.

Academically calculated expectations suggest first cousins should share 850 cM. The data collected by Blaine showed an actual average of 874 cM, but varied within a 99th percentile range of 553 to 1225 cM utilizing 1512 respondents. You can view the expected values for relationships in the article, Concepts – Relationship Predictions and a second article, Shared cM Project 2017 Update Combined Chart that includes a new chart incorporating the values from the 2016 Shared cM Project, the 2017 update and the DNA Detectives chart reflecting relationships as well.

Philip grouped the results into the same bins as used in the 2017 Shared cM Project:

I’d say that they look very similar. The spread is just about right. The Shared cM data is a little higher but this is consistent with vendor results typically containing around 20 cM of short IBC segments. My sample size is about 50 times greater so this gives more opportunity to observe extreme values. I observed 3 events exceeding 1410 cM, with a maximum of 1461 cM. At the lower end I have 246 events (about 0.3%) with fewer than 510 shared cM and a minimum of 338 cM.

I thought that the gender of the related parents of the 1st cousins would have quite an impact on the spread of the amounts shared between their children. Fewer crossovers for males means that the respective children of two brothers would be receiving on average, larger segments of DNA, so greater opportunity for either more sharing or for less. Conversely, the respective children of two sisters, with more crossovers and smaller segments, would be more tightly clustered around the average of 12.5% (854 cM in my model). There is a difference, but it’s not nearly as pronounced as I was expecting:

The most noticeable difference is in the tails. First cousins whose fathers were brothers are twice as likely to either share less than 8% or more than 17% than first cousins whose mothers were sisters. And of course, if the cousins were connected via a respective parent who were brother and sister to each other, the spread of shared cM is somewhere in between.

16 thoughts on “First Cousin Match Simulations”

Hi Roberta,interesting.I want to develop this into the children of a 1st cousin marriage.My paternal grandfather’s parents were 1st cousins.One female,the other male.His father and her mother were siblings.what effect does this have on the inheritance of DNA in view of what you have previously written that”you can inherit an entire segment of an ancestor’s DNA,or none at all”None of the succeeding generations seem to have resembled the greatgrandfather very much.Does this inbreeding cause mutations in DNA which are significant?

The only company who uses the X is 23andMe. My least favorite company, but the only one that utilizes the X properly. FTDNA and AncestryDNA matching algorithm does not use the X to find matches.

There is less crossover between brothers, but sisters share more DNA due to two Xs. Each sister carries the full X from the father, and thus a complete Half Identical X. Plus a some Half Identical X from the mother. Making some Full Identical X segments between the two sisters.

This would obviously not include the X because of the inheritance pattern. FTDNA does include the X, but only if the threshold is exceeded with other matches first. That’s probably a good thing, because in most cases, you need about twice as many cM for the X to be as reliable or informative as the other autosomes. Many false X matches. So X alone, no they don’t.

The X chromosome was included in the simulations but the article is just showing results for the 22 autosomal chromosomes. The X is quite a bit more complicated as there are quite a few first cousin pairs that cannot share X segments because one or both of them inherited a Y chromosome from the common grandparents.

The more extreme results with children of brothers; I can’t remember where I read it, but apparently recombination occurs less frequently in men than in women (about 20 times vs. 30 times), so that would fit; two children of brothers then would inherit larger unrecombined segments from their father, which then have both a larger chance to overlap, and to not overlap.

Precisely! The difference in the recombination rates between the genders is what drives the variation. There have been many scientific studies and they all point to an average of around 27 crossovers in male meiosis and an average of around 42 crossovers in female meiosis.

Hi Roberta
My first cousin shares 1513 centimorgans with me. Our fathers were brothers. I’m male, my cousin is female. This is well outside the range of the above graphs. FTDNA predicted we are half-siblings but we are not.
Fortunately we know the answer from other sources. Our grandparents were rather more closely related than was legal, so we got a double helping of the same DNA from them.
Which makes us, I suppose, sort of double cousins. I wonder how many other incestuous relationships there are at the margins of these ranges.

I should point out that in the simulations my assumption is that there is no genetic relationship between the two grandparents that are common to the first cousins. And also no genetic relationship between the two spouses of the cousins’ parents, either to each other or to their respective spouses. So all of the results presented describe only the shared DNA that first cousins receive from their common grandparents. Any additional shared ancestral paths could add to the amount of matching DNA that is shared between cousins.

I was wondering if you ever submitted a paper describing the tool and its underlying model for academic publication. If so, is there a link to the paper online? Also, I am curious how long your simulator took to run 80000 experiments as described in this blog post and what language you used for programming the simulation.

I still haven’t written the paper. There have been too many interesting distractions in my own family tree research! But a New Year’s resolution is to submit a paper this year.

The simulator only took about 10 or 15 minutes to run 80,000 simulations. It’s a little slower now as I have extended it out to 4th-cousins.

The model is an Excel spreadsheet. Triggering the spreadsheet to recalculate performs another simulation. There are no macros, no VB code which is why it is so fast. The results of the simulations are simply stored in a Data Table.

I am trying to trace my birth father who was an American Serviceman in the 2nd World War. I have found a supposed 1st cousin with the shared centimorgans of 1121 – 45 segments. Would she definitely be my 1st cousin as we are roughly the same age?
Regards,
Sandra.