Category Archives: Aggregated Data

When I first started this blog, my goal was to provide explanations and examples of genetic genealogy topics so that there would be fewer questions and easier answers.

That sounded like a great idea, but the reality of the situation is that the consumer market for autosomal DNA testing has exploded – meaning more and more consumers with more and more questions. Compounding that situation, the consumers who purchase these tests today, especially on impulse, and mostly I’m referring to Ancestry.com here, often have absolutely no idea what to expect or even what they want except that Ancestry will find their ancestors for them. That’s because that’s what Ancestry tells them in their advertising.

So, in the big picture, the questions and inquiries that experienced people are currently receiving are becoming less specific and more general and often exhibit a lack of understanding of what DNA testing can do. It’s frustrating to parties on both sides of the fence, but I’m glad people are asking because it means they are interested and willing to learn.

Rather than approach this topic from a technical perspective of how to work with autosomal DNA, I’d like to talk about what can be done with autosomal DNA testing from a newbie perspective. The person who just got their results back and are saying to themselves, “OK, now what can I do with this?”

However, there is lots “how to” information in this article for everyone if you click on the links. If nothing else, this gives you a tool to send to those overly excited newbies who are starry eyed but have no clue how to proceed. Remember, you were once new too!

This is part 1 of a two part series. The second part will focus on how to make contact with your matches successfully. But now, let’s pretend it’s day 1 and you just got your autosomal test results back.

Why Did You Test?

The first question to ask yourself is why did you test in the first place? If your answer is “because Ancestry had a sale,” that’s fine, but then you’ll need to read all four options to know what you can do with autosomal DNA.

1. I want to meet other people I’m related to.

Ok, but the first thing here you’re going to have to define is the word “related.” You are likely related to everyone on your match list. I said likely, because there may be some people there whose DNA simply matches yours by chance. For the most part, and especially for those people who are your closest matches, you’re related somehow. The challenge, of course, is to figure out how – meaning through which ancestor. This is the genealogy jigsaw puzzle of you!

All three of the major vendors, Family Tree DNA, Ancestry and 23andMe show you your closest matches first on your match list.

Do you want to meet your DNA cousins only if you can identify a common ancestor? Do you want to work with them on genealogy? The answers to these questions will help sort through the rest of what to do and how.

If your goal is to contact your matches, then Family Tree DNA is the easiest, as they provide you with the e-mail addresses of your matches by clicking on the little envelope for each match on your match page, shown above.

Ancestry is second easiest, but forces you to use their internal message system which often doesn’t deliver the messages. (Do not send more than 30 in one day or Ancestry will blacklist your messages and block your communications, thinking you are a spammer.)

23andMe is the most difficult as you have to request permission to communicate with each match and also to share DNA and if your match authorizes communication, then you can communicate through 23andMe’s message system. Sound cumbersome? It is and the response rate is low.

Confirming Genealogy

Let’s look at another reason for testing.

2. I want to confirm my genealogy is correct – meaning that my great-grandfather really is my great-grandfather and so forth on up the line.

Well, you’re in luck, especially if some of your cousins, known or otherwise, have tested. Confirming your genealogy is easier done in closer generations than more distant ones and the more cousins from various lines that have tested, the better. That’s because you will share more of your DNA with relatives when you have a close common ancestor.

Autosomal DNA is divided approximately in half in each generation, when the child receives half of their DNA from each parent – so the closer your cousin, the more likely you are to share more DNA with them. The more DNA you share, the more likely you are to be able to identify which ancestor it comes from. And if a match matches you and your proven cousin both on the same segment, that identifies positively which line that match comes from. That three way matching is called triangulation.

Let’s talk about the word “confirm.” Herein lies a challenge, because DNA does have the absolute ability to confirm ancestors, as noted above. DNA also has the ability to give you hints that go towards a “preponderance of evidence.” DNA, can also lead you astray if you draw erroneous conclusions – and one vendor provides a tool (or tools) that encourages overstepping conclusions. Let’s look at each circumstance.

Proof Positive through Triangulation

Just what it says – absolutely unquestionable proof that a particular ancestor is your ancestor. If you match two other people who also descend from your common ancestors, Joe and Jane Doe, on the same segment of DNA, that is confirmation that you share that ancestor and that segment of your DNA is considered proven to that ancestral line. This requires two things. First, that your DNA matches on the same segment AND that you have identified the same ancestors, Joe and Jane Doe, genealogically in your trees.

Now, you probably can’t tell which side of the couple, Jane or Joe, the DNA is from unless you also match two people on just Jane’s side of the family or just Joe’s on that same segment.

One caveat here – counting you and your parent as two of the three people doesn’t work because you and your parents are too close in the tree. By three people, that would preferably be three people who descend from that couple through three different children.

Here’s an example.

It would also ideally be more than three people, but three is the minimum to form a triangulation group. In the real world, these matches might not start and end of the same segments as in the example above, but the overlapping portion should be significant

The example above is proof positive, because the three people descend from the same ancestor, through different children, and match on the same chromosome in the same locations.

Now for the bad news – you can’t do this at Ancestry.com, because they don’t provide you with any of the segment information in the last 5 columns. Ancestry has no chromosome browser, which is the tool that shows you where on your DNA you match your cousins.

Family Tree DNA’s chromosome display tool that is part of their chromosome browser is shown below.

On the example above, you can see that Barbara Jean Long, the black background person on the chromosome graphic, is being compared to her two first cousins, the blue and orange on the chromosome graphic.

You can download the information from Family Tree DNA or 23andMe in spreadsheet format, or you can display the information graphically, like in the example above. You can see the “stacked” locations where both the cousins match the black background person they are being compared to. You can also see that there are some locations where only one of the cousins matches the background person, like on chromosome 20. And of course, some locations where neither cousin matches the background person, like on chromosome 21.

If you download that data, the information gives you the locations where the people being compared match the person they are being compared against.

The chart above is the download of part of chromosome 1 for Barbara, Cheryl and Donald, siblings who are Barbara’s first cousins.

The areas where the 3 people overlap, or triangulate, are colored in green on the spreadsheet, while the rows entirely in pink or blue do not triangulate – meaning Barbara matches either one cousin or the other, but not both. Keep in mind that this example only proves their common ancestral couple, which in this case are common grandparents – but the technique is the same no matter which common ancestor you are trying to prove.

This bring us to our next topic, that of close relatives.

Close Relative Matches

I previously said that you can’t use you and a close relative to prove a distant ancestor. But that’s not necessarily true when the relationship you are trying to prove is closer in time. The chart below shows the relationships of the example above.

In the case shown above, two first cousins who are siblings, Cheryl and Don, are being compared to their common first cousin, Barbara. Their fathers were siblings and their common ancestors were their grandparents. This is not 6 generations up a tree where matching is iffy. You can be expected to match closely with your first cousins where you may not match with more distant cousins, because you simply didn’t inherit any of the same DNA from your distant common ancestor. You should be sharing about 12.5% of your DNA with first cousins, and if you have first cousins that you’re not matching, that might signal that an undocumented adoption has occurred in one line or the other.

In a case like this, if you and a first cousin match, that suffices to prove a close connection. If you don’t match, it suffices to raise questions. A lot of questions. Big ugly questions. The next thing to do is to see if any other known cousins have tested and who they match – or don’t match.

For example, if Barbara Ferverda was not the child of John Ferverda, she would not match either Cheryl nor Don, and we’d know there was a problem. If Cheryl and Don match other Ferverda or Miller relatives and Barbara didn’t, then we’d know the genetic break in the line was on Barbara’s side and not on Cheryl/Don’s side.

This same technique is also how we know which “side” matches are on. If an unknown match matches both Barbara and Cheryl, for example, it’s a good bet that their common ancestor is someplace in the Miller/Ferverda line. If they also match another Miller on the same segment, then the common ancestor has been narrowed to the Miller side of the Miller/Ferverda couple.

Unfortunately, not all DNA results are as definitive or easy to prove as these. Let’s look at some of the more “squishy” results.

Preponderance of Evidence through Aggregated Data

In regular genealogy, there are a range of proofs. There is direct evidence that someone is the child of an ancestor. That would be a will, for example, that names a daughter and her husband and maybe even tells where they moved to. This would be your lucky day!

Think of that will as equivalent to triangulated proof of a common ancestor. There is just no arguing with the evidence.

If you’re not that lucky, you have to piece the shreds of indirect evidence together to make a story. In the genealogy world, this is called preponderance of evidence, and I am always, always much less comfortable with this type of evidence than I am with solid proof.

There are various flavors of pieces of evidence in the DNA world. Sometimes we have hints of relationships without proof.

The most common is when you have matches with a group of people who share the same surname, but you can’t get back far enough to find a common ancestor. Is this a probable match? Yes? Guaranteed? No. Have I seen them fall apart and the actual match be on another entirely unrelated line? Yes. See why I call these squishy?

Ancestry takes this one step further with their DNA Circles. For a DNA Circle to be created, you must match DNA with someone in the Circle AND everyone in the Circle must match DNA with someone else in the Circle AND everyone in the Circle must have a common ancestor in their tree. Circles begin with a minimum of three people. Generally, the more people who match AND have the same ancestor, the stronger the likelihood that you would be able to confirmation the common ancestor of the group as your ancestor too – if you had a chromosome browser type of tool. Still, Circles alone are not and never will be, proof. Circles are great hints and along with other research, can confirm genealogical research. For example, my paper genealogy says I descend from Henry Bolton, and I find myself in Henry Bolton’s tree, matching several other Bolton descendants through Henry’s other children. Those multiple connections pretty well confirms the paper trail is accurate and no undocumented adoptions have occurred in my line.

Now, the bad news….Circles is predicated upon matching of trees. If there is a common misconception out there that is replicated in these trees, then people who match will be shown in a Circle predicated on bad information. And, there is no way to know. However, people interpret the existence of a DNA Circle as proof positive and that it confirms the tree. Membership in a DNA Circle is absolutely NOT proof of any kind, let alone proof positive – except that your DNA matches the people who you are connected to by lines and their DNA matches the people they are connected to by lines. You can see my connections in orange below, and the background connections in light grey.

This is an example of my Henry Bolton Circle. I match 5 different people’s DNA (the orange lines) who also show Henry Bolton as their ancestor. This does NOT mean the match is on the same segment, so it is NOT triangulated. This is a grouping of data where multiple people match each other, not a genetic triangulation group where everyone matches on the same segment. In fact there are cases that I have found where the person I match in a circle is through a different line entirely, so in that case, the presumption of which common ancestor our common DNA is from is incorrect.

I want to be very clear, there is nothing wrong with DNA Circles, so far as they go. The consumer needs to understand what Circles are really saying – and what they can’t and don’t say. DNA Circles are another important tool in our arsenal. We just have to be careful not to assume, or presume, more than is there. Presuming that we match someone in the Circle because we share Henry Bolton’s DNA may in fact be inaccurate. We may match on a completely unrelated line – but because we do match and share a common ancestor in our tree – we both find ourselves in the Henry Bolton Circle.

Are you reading those squishy words? Presume – it’s related to the word assume…right??? And keep in mind that Circles are created based in part on those wonderfully accurate Ancestry trees. Are you feeling good about this preponderance of evidence yet?

However, in my case, I’ve done due diligence with the genealogy and I have all of my proof ducks in a row. The fact that I do match so many Bolton descendants confirms my work, along with the fact that at the other vendors and at GedMatch, I have triangulated my matches and proven the Bolton DNA. So, this circle is valid but the only proof I have is not found at Ancestry or because I’m a Circle member, but by triangulation and aggregated data using other vendor’s tools.

This next screen shot is of an exact triangulated match using GedMatch’s triangulation tool. Each line shows me matching two cousins, along with the start and stop segments. This just happens to be the Ferverda example. So, I match six people, all on the same segment, all with a known common ancestor. This is proof positive. Not all “matching” is nearly so definitive.

Sometimes the matches aren’t so neat and tidy. That’s when we move to using aggregated data.

Aggregated Data – What’s That?

Aggregated data is a term I’ve come up with because there isn’t any term to fit in today’s genetic genealogy vocabulary. In essence, aggregated data is when a group of people (who may or may not know who their common ancestor is) match on common segments of data, but not necessarily on the same segments, or not all of the same segments. When you have an entire group of these people, they form a stair step “right shift” kind of graph.

The interesting part of this is that by utilizing aggregated data and looking not only at who we match, but who our matches match that share a common ancestor, we can gain insight and hints. Finding a common ancestor is of course a huge benefit in this type of situation because then you’ve identified at least a DNA “line” for the entire group.

If we were to utilize the triangulation tools at Gedmatch and look at my closest triangulated matches, they would look something like this, where the segments that I match with each person (or in this case, two people) shift some to the right. What you are seeing is the start and stop match locations, with graphing. Therefore, I match all of these people that have a common ancestor.

Each match overlaps the one above and below to come extent – and often by a lot. These are known as triangulation groups (TG).

However, the top match and the bottom match do not overlap, so they don’t triangulate with each other. They are still valid triangulated matches to me and you can expect to see this kind of matching when using aggregated data.

Understand that when you see your triangulation groups at GedMatch, your mother’s side and your father’s side will be intermixed. In this case, I know the common ancestor and I know many of these testers, so I’m positive that this is a valid grouping (plus, they all match my Mom too – the best test of all.)

Here’s another example only showing three matches. All three are triangulated to me through the same ancestor, but the locations of the top and bottom matches don’t overlap with each other. Both overlap the one in the middle in part.

The second way to potentially discover a new ancestor is Ancestry’s New Ancestor Discoveries, NADs, which is really a somewhat misleading name. What Ancestry has determined is that you match a group of people who share a common ancestor – and Ancestry’s leap of faith is that you share that ancestor do too. While that may not be correct, what IS very relevant is that you do match this group of people who DO share a common lineage and there is an important hint there for you someplace! But don’t just accept Ancestry’s discovery as your new ancestor – because there is a good chance it isn’t. Let’s take a look.

Ancestral Lines Through Triangulation

Let’s go back to the John Doe example.

Let’s take the worst case scenario. You’re an adopted and have no information. But you match an entire group of people in a triangulated group who DO know the identity of their common ancestor.

Does this mean that John Doe is your ancestor? No. John Doe could be your ancestor, or he could be the brother of your ancestor, or the uncle of your ancestor. What this does tell you is that either John Doe is your ancestor, some of John Doe’s ancestors are your ancestors, or you are extremely unlucky and you are matching this entire group by chance. The larger the segment, the less likely your match will be by chance. Over 10 cM you’re pretty safe on an individual match and I think you’re safe with triangulated groups well below 10 cM.

Ancestry’s New Ancestor Discoveries

You can make this same type of discovery at Ancestry, but it’s not nearly as easy as Ancestry implies in their ads and you have no segment data to work with, just their match, shown below.

“Just take the test and we’ll find your ancestors,” the ad says. Well, yes and no and “it depends.”

Ancestry went out on a limb a few months ago, right about April Fools Day, and frankly, they fell off the end of the branch by claiming that New Ancestor Discoveries are your missing ancestors found. While that is clearly an overly optimistic marketing statement, the concept of matching you with people you match who all share a common ancestor is sound – it was the implementation and hyper-marketing that was flawed.

The premise here is that if you match people in a Circle that have a common ancestor, that you too might, please note the word might, share that ancestor – even if that person is not in your tree. In other words, even if you don’t know who they are. Just like the John Doe triangulation example above.

Here is my connection to the Larimer DNA Circle, even though I don’t know of a Larimer ancestor.

Now, the problem is that you might be related to an ancestor on one side upstream several generations, but it’s manifesting itself as a match to that particular couple because several people of that couple’s descendants have tested. I’ve shown an example of how this might work below.

In this example, you can see that your true common ancestor is unknown to both groups of people, but it’s not Mary Johnson and John Jones, or in my case, not John and Jane Larimer.

However, three descendants of Mary Johnson and John Jones tested, and you match all three. If you also showed Mary Johnson and John Jones in your tree, then you’d be in a Circle with them at Ancestry. However, since Mary Johnson and John Jones are NOT your ancestors, they are not in your tree. Since you match three of their descendants, Ancestry concludes that indeed, Mary Johnson and John Jones must also be your ancestors.

While NADs are inaccurate about half the time, the fact that you do share DNA with the people in this group is important, because someplace, upstream, it’s likely that you share a common ancestor. It’s also possible that you match these three people through unconnected ancestors upstream and it’s a fluke that they all three also descend from this couple. And yes, that does happen, especially when all of the people involved have ancestors from the same region.

The first day that Ancestry rolled the New Ancestor Discoveries, I was assigned a couple that could not possibly be my ancestors. I called them Bad NADs.

In my experience, there are more erroneous NADs out there than good ones. I knew my original one was bad, as I had proof positive because I have triangulated my other lines. Then, one day, my bad NAD was gone and now, a few weeks later, I have another assigned NAD couple that I have not been able to prove or disprove – the Larimers. Truthfully, after the bad NAD fiasco, I haven’t spent a lot of time or effort because without tools, there is no place to go with this unless the people I match will download their results to GedMatch. I’m hoping that a new tool to be released soon will help.

Here’s how NADs could be useful. Let’s say that my Larimer matches download to GedMatch and I discover that they also match a triangulated group from my McDowell line. Well, guess what – my Michael’s McDowell’s wife is unknown. Might she be a Larimer? Michael’s mother is also unknown. Might she be a Larimer? It gives me a line and a place to begin to work, especially if they share any common geography with my ancestors.

Even if the NADs aren’t my direct ancestors, this is still useful information, because somehow, I probably do connect to these people, even though my hands are somewhat tied. However, labeling them New Ancestor Discoveries encourages people to jump to highly incorrect conclusions. This isn’t even in the preponderance of evidence category, let alone proof. It’s information that you can potentially use with other DNA tools (at GedMatch) and old fashioned genealogy to work on proving a connection to this line. Nothing more.

So what is the net-net of this? Circles can count in the preponderance of evidence, especially in conjunction with other evidence, but NADs don’t. Neither are proof. If we were able to work with the segment data and compare it, we might very well be able to determine more, but Ancestry does not provide a chromosome browser, so we can’t.

If this is your DNA testing goal, you certainly did not start by testing with Ancestry.com, because they don’t have any tools to help you do this. This tends to be a goal that people develop after they really understand what autosomal DNA testing can do for them. In order to map your genome, you have to have access to segment information and you have to triangulate, or prove, the segments to each ancestor. So count Ancestry out unless you can talk your matches into downloading their raw data files to either GedMatch or Family Tree DNA. You’ll be testing with both Family Tree DNA and 23andMe and downloading your match information to a spreadsheet and utilizing the tools at www.gedmatch.com and www.dnagedcom.com.

Just so you get an idea of how much fun this can be, here’s my genome mapped to ancestors a few months ago. I have more mapped now, but haven’t redone my map utilizing Kitty Cooper’s Tools.

Tips and Tricks for Contact Success

Regardless of which of these goals you had when you tested, or have since developed, now that you know what you can do – most of the options are going to require you to do something – often contacting your matches.

One thing that doesn’t happen is that your new genealogy is not delivered to you gift wrapped and all you have to do is open the box, untie the bow around the scroll, and roll it down the hallway. That only happens on the genealogy TV shows:)

So join me in a few days for part two of Autosomal DNA Testing 101 – Tips and Tricks for Contact Success.