Monthly Archives: August 2012

I was going to call this “Fun with Genealogy, Math, and Data” but then I’d have even fewer readers. (Darn, I said it! There go the readers!)

Lots of Boxes in the Family Tree

The odd thing about family trees is that the number of people gets larger as you go back through the generations, but the population of the world is smaller as you go back through the generations. If my family tree had no repetition, meaning that every spot in the tree held a different person, I’d have 1.6 novemdecillion ancestors by the time you go back to about 4000 BCE. That’s 16 followed by 59 zeros, which is vastly, hugely, astronomically more than the world population 6000 years ago of around 7 million people. It’s vastly more than the estimated number of stars in the universe.

The explanation, of course, is that a family tree must have tons of repetition when you go that far back. Somewhere back in ancient Europe, where DNA testing places my heritage, there are couples who must show up trillions of trillions of times in my family tree.

The Crossover Point

The next question is: Where’s the crossing point between the size of my family tree and the population of the world? At which generation in the family tree does the size of that generation exceed the world population? It turns out to fall somewhere around the year 1100. I’m estimating 30 years per generation. Look back 28 generations before my year of birth and we hit the year 1118. That 28th generation of the family tree has more than a quarter of a million people in it. The world population back then was a little more than that, somewhere around 320 million. Look back 29 generations to about 1088, and we’ve got over half a billion people in the family tree, but the world population was smaller than that. That’s the crossover, then. Somewhere around the turn of the 12th century, my family tree is larger than the population of the world. There are more spots to fill than people to fill them.

What does that mean? Although anyone could have repetition in the family tree more recently than that, it’s guaranteed to happen by the time you reach back to the Middle Ages. It also means that if your ancestors and a friend’s ancestors were from the same general region back then, there’s a very real possibility that you’re very distant cousins. If two people today have an ancestor in common from 29 generations ago, they’re 28th cousins.

That crossover point is where your family tree must have repetitions. Most likely, you’ve got repetitions that are much more recent, because you’re not descended from everyone who was alive back then. Some of those people didn’t have kids, or didn’t have family lines that survived until the present day, and some simply aren’t your ancestors.

Repetition in Our Family Trees

Both my wife and I have known repetitions in our family trees. Phillip Harmon (1803-1853) and Nancy Jackson (1801-1885) are my 4th great-grandparents in two different places. Their daughter’s son married their son’s daughter – cousin married cousin – back in southern Indiana. In my wife’s family tree, Pierre Georges Riffaud (1834-1890) and Marie Elisabeth Zélie (1833-1893) are her great-great-grandparents twice over (life on a small island, Martinique). Two of their descendants got married and became my wife’s ancestors.

We also found potential common ancestors in medieval Europe (because European nobility and royalty kept careful track of their lineage). Our evidence isn’t rock solid every step of the way, but it’s mostly pretty good, so we might well be distant cousins through some medieval ancestor. The math above makes this a rather unsurprising result. Just about all of European royalty was descended from Charlemagne, and there’s a decent chance that if you have European heritage, you’re descended from some European royal too, and therefore also from Charlemagne. If you have French heritage in particular, you’re probably descended from Charlemagne. Roughly 30% of today’s African-Americans also have European ancestry, so if you’re descended from slaves in the US, you too could be one of Charlemagne’s descendants.

If you are indeed descended from Charlemagne, you’ve got lots of repetition in your family tree. If he’s there at all, he’s probably there in multiple places.

30 Years per Generation?

Earlier, I estimated 30 years per generation. That’s a common genealogical estimate, but can we test it? Sure, with more math and more data! Yay! The average generation interval between an ancestor and a descendant is: (descendant’s birth year – ancestor’s birth year) / (number of generations between them). My 3rd great-grandfather Mathias Becker was born in 1814. I was born in 1958. That’s (1958 – 1814)/5 = 28.8 years. That’s one line. When I average out the 4 generations behind our kids, I get 30.4 years. Our family trees have complete birth info on everyone for those 4 generations. Our info gets more sparse as you go back. When I average what we have for 5 generations, I get 30.6 years. When I throw in the 6th generation, where the birth dates fall in the late 18th century, I get an average of 29.8 years. Those averages are all in the neighborhood of 30 years, so the estimate seems like a decent one, at least for the last few centuries of European heritage.

To go back farther in time, I looked at the oldest line I could trace with any kind of data, to Pepin of Landen, Charlemagne’s 3rd great-grandfather. (Once your family tree ties into European royalty, the family tree grows a lot.) For the sake of the exercise, let’s accept the path leading to dear old Pepin without pointing out where the weak links are, and see what this does to the average generation interval. He was born in about the year 580. In the family tree data we’ve accumulated, he shows up 24 times as an ancestor of my kids: 4 times as their 41st great-grandfather, 15 times as 42nd, and 5 times as 43rd. He’s 43-45 generations behind my kids. Five of those 24 ancestral spots are on my wife’s side, 19 on mine. He’s probably there in a lot more places that we don’t know about.

For my youngest, the average generation interval between her and Pepin of Landen is 31.4, 32.1, or 32.8 years, depending on the path you take. A rule of thumb of 30 years per generation still seems about right, all the way back to the early Middle Ages.

A lot of quotes get attributed to Albert Einstein. It seems he didn’t say any of the following:

“If a man is kissing a pretty girl while driving safely, he is simply not giving the kiss the attention it deserves.”

“If the bee disappeared off the face of the globe, then man would have only four years of life left.”

“Evil is the absence of God.”

“Computers are incredibly fast, accurate and stupid; humans are incredibly slow, inaccurate and brilliant; together they are powerful beyond imagination.”

“The definition of insanity is doing the same thing over and over and expecting different results.”

“The intuitive mind is a sacred gift and the rational mind is a faithful servant. We have created a society that honors the servant and has forgotten the gift.”

“I wish I was as smart as Jim.”

These quotes get passed around a lot, but “Frequent repetition doesn’t prove anything.” — Abraham Lincoln (who also posts a lot on Facebook)

“No, Lincoln didn’t say that. I did.” — Benjamin Franklin

Enter the Genealogical Proof Standard(GPS) as a way to weigh evidence. I do lots of genealogical digging, and I find that lots of family tree info posted on the Internet is either unsourced or it’s clearly rubbish (a child born before his grandparents???). The difference between the good info and the bad info is whether the person posting it followed the GPS. Quoting from the official GPS description: “The GPS consists of five elements:

a reasonably exhaustive search;

complete and accurate source citations;

analysis and correlation of the collected information;

resolution of any conflicting evidence; and

a soundly reasoned, coherently written conclusion.”

In short, the GPS means you build up enough evidence to say that this is probably true, and the alternatives probably aren’t. It’s not proof beyond all reasonable doubt, but it’s stronger than saying something is merely plausible or that you hope it’s true or you think it’s true.

The problem with the quotes that keeping getting attributed to Albert Einstein, Abraham Lincoln, and others is that they miss on all counts, just like the lesser genealogical contributions posted online.

Take the insanity quote. I’ve been unable to find any verifiable source for the quote. I haven’t found any original sources. I haven’t found anyone who said, “It’s in this book/paper he wrote, which you can look up; see page x.” Nobody has said, “He said it during an interview held on mm/dd/yy, and a transcript/recording is available.” Nobody has said, “It’s in this letter he wrote, which has been verified, and here’s where you can find the letter.” Whether or not my search for a source counts as reasonably exhaustive, I haven’t uncovered a single source that someone could look up. So far, assigning the insanity quote to Einstein falls short of the mark for the first three of the GPS guidelines.

How about conflicting evidence? Sometimes, the insanity quote gets attributed to Ben Franklin, and sometimes to more recent figures. Take a look at the discussion on the Benjamin Franklin Wikiquote page. For the insanity quote, conflicting evidence about who said it is at best unresolved, and maybe even tilted away from Einstein. This falls short of the mark for the fourth GPS guideline.

Without hitting the first four guidelines, I can’t offer up the fifth: a “soundly reasoned, coherently written conclusion” that claims Einstein is the source of that definition of insanity.

An additional element of genealogical research that’s useful outside of genealogy is evaluating the quality of a source. The merest rumor or vague recollection can be a source, but they’re not very good ones. A low-quality sources gives you something to check out if it seems plausible, but it’s not strong evidence.

The best genealogical sources were created at the time by someone who was present and well-informed, like a marriage record created at the time. The worst genealogical sources were created long after, by someone who wasn’t there, who got the information from someone else who wasn’t there; the focus is often more on what sounds cool than on what’s accurate. Lots of the Einstein quotes getting passed around online are like the worst genealogical sources.

Why do I care? “Does it matter who said what, if it’s a good quote?” — Dalai Lama

I don’t want to add to the flood of misinformation on the Internet. I don’t like passing around rumor as truth. The ability to draw a sound conclusion is terribly important in the world today, so I’m disappointed when I see a disregard for accuracy, even on something as mundane as a good one-liner. Or look at it this way: if you were playing a trivia game, it’s the difference between right and wrong … unless the trivia game itself did a sloppy job of verifying its answers.