Post navigation

AutoClustering by Genetic Affairs

The company Genetic Affairs launched a few weeks ago with an offer to regularly visit your vendor accounts at Family Tree DNA, Ancestry and 23andMe, and compile a spreadsheet of your matches, download it, and send it to you in an e-mail. They then update your match list at regular intervals of your choosing.

I didn’t take advantage of this, mostly because Ancestry doesn’t provide me with segment information and while 23andMe and Family Tree DNA both do, I maintain a master spreadsheet that the new matches wouldn’t integrate with. Granted, I could sort by match date and add only the new ones to my master spreadsheet, but it was never a priority. That was yesterday.

AutoClustering

That changed this week. Genetic Affairs introduced a new AutoClustering tool that provides users with clustered matches. I’m salivating and couldn’t get signed up quickly enough.

Please note that I’ve cropped the names for this article – the Genetic Affairs display shows you the entire name.

In short, each tiny square node represents a three-way match, between you and both of the people in the intersection of the grid. This does NOT mean they are triangulated, but it does mean there’s a really good chance they would triangulate. Think of this as the Family Tree DNA matrix on steroids and automated.

This tool allows me by using my mother’s test as well to actually triangulate my matches. If they are on my mother’s side of the tree, match me and mother both, and are in the match matrix, they must triangulate on my mother’s side of my tree if they both match me on the same segment.

With this information, I can check the chromosome browser, comparing my chromosomes to those other two individuals in the matrix to see if we share a common segment – or I can simply sort the spreadsheet provided with the AutoCluster results. Suddenly that delivery service is extremely convenient!

No, this service is not free, but it’s quite reasonable. I’m going to step through the process. Note that at times, the website seemed to be unresponsive especially when moving from one step to another. Refreshing the page remedied the problem.

Add websites where you have accounts. All of your own profiles plus the other people’s that you manage at both Ancestry and 23andMe are included when you register that site in your profile.

You’ll need your signon information and password for each site.

At Family Tree DNA, you’ll need to add a new website for each account since every account has its own kit number and password.

I added my own account and my mother’s account since mother’s DNA is every bit as relevant to my genealogy as my own, AND, I only received half of her DNA which means she will have many matches that I don’t.

When you’re finished adding accounts, click on “Websites and Profiles” at the top to open the website tab of your choosing and click on the blue circular arrows AutoCluster link. You are telling the system to go out and gather your matches from the vendor and then cluster your matches together, generating an AutoCluster graphic file.

There are several more advanced options, but I’m going to run initially with Approach A, the default level. This will exclude my closest matches. Your closest matches will fall into multiple cluster groups, and the software is not set up to accommodate that – so they will wind up as a grey nonclustered square. That’s not all bad, but you’ll want to experiment to see which parameters are best for you.

If you have half-siblings, you may want to work with alternate settings because that half-sibling is important in terms of phasing your matches to maternal or paternal sides.

Asking me if “I’m sure” always causes me to really sit back and think about what I’ve done. Like, do I want to delete my account. In this case, it’s “overworry” because the system is just asking if you want to spend 25 credits, which is less than a dollar and probably less than a quarter. Right now, you’re using your free initial credits anyway.

The first time you set up an account, Genetic Affairs signs in to your account to assure that your login information is accurate.

I selected my profile and my mother’s profile at Family Tree DNA, plus one profile each at 23andMe and Ancestry. I have two profiles at both 23andMe (V3 and V4) and Ancestry (V1 and V2).

When making my selections, I wasn’t clear about the meaning of “minimum DNA match” initially, but it means fourth cousin and closer, NOT fourth and more distant.

My recommendation until you get the hang of things is to use the first default option, at least initially, then experiment.

Welcome

While I was busy ordering AutoClusters, Genetic Affairs was sending me a welcome e-mail.

Hello Roberta Estes,

Thank you for joining Genetic Affairs! We hope you will enjoy our services.

You currently have 200 credits which can be supplemented using single payments and/or monthly subscriptions. Check out our prices page for more information concerning our rates.

Please let us know if anything is unclear, we can be reached using the contact form.

The great news is that everyone begins with 200 free credits which may last you for quite some time. Or not. Consider them introductory crack from your new pusher.

Options

Genetic affairs will sign on your account at either Ancestry, 23andMe or Family Tree DNA, or all 3, periodically and provide you with match information about your new matches at each website. You select the interval when you configure your account. After each update, you can order a new AutoCluster if you wish.

Each update, and each AutoCluster request has a cost in points, sold as credits, associated with the service.

To purchase credits after you use your initial 200, you will need to enter your credit card information in the Settings Page, which is found in the dropdown (down arrow) right beside your profile photo.

You can select from and enroll in several plans.

Prices which varies by how often you want updates to be performed and for how many accounts. To see the various service offerings and cost, click here.

Here’s an example calculation for weekly updates:

This is exactly what I need, so it looks like this service will cost me $2.16 per month, plus any Autoclustering which is 25 credits each time I AutoCluster. Therefore, I’ll add another 100 credits for a total of $3.16 per month.

It looks like the $5 per month package will do for me. But don’t worry about that right now, because you’re enjoying your free crack, um, er, credits.

Ok, the e-mail with my results has just arrived after the longest 10 minutes on earth, so let’s take a look!

The Results E-mail

In a few minutes (or longer) after you order, an e-mail with the autoclustering results will arrive. Check your spam filter. Some of my e-mails were there, and some reports simply had to be reordered. One report never arrived after being ordered 3 times.

The e-mail when it arrives states the following:

Hello Roberta Estes,

For profile Roberta Estes: An AutoCluster analysis has been performed (access it through the attached HTML file).

As requested, cM thresholds of 250 cM and 50 cM were used. A total number of 176 matches were identified that were used for a AutoCluster analysis. There should be two CSV files attached to this email and if enough matches can be clustered, an additional HTML file. The first CSV file contains all matches that were identified. The second CSV file contains a spreadsheet version of the AutoCluster analysis. The HTML file will contain a visual representation of the AutoCluster analysis if enough matches were present for the clustering analysis. Please note that some files might be displayed incorrectly when directly opened from this email. Instead, save them to your local drive and open the files from there.

Attached I found 3 files:

Matches list

Autocluster grid csv file

Autocluster html file that shows the cluster itself

The Match Spreadsheet

The first thing that will arrive in your e-mail is a spreadsheet of your matches for the account you configured and ordered an AutoCluster for.

In the e-mail, your top 20 matches are listed, which initially confused me, because I wondered if that means they are not in the spreadsheet. They are.

At 23andMe, I initially selected 5th cousins and closer, which was the most distant match option provided. I had a total of 1233 matches.

23andMe caps your account at 2000 (unless you have communicated with people who are further than 2000 away, in which case they remain on your list), but you can’t modify the Genetic Affairs profile to include any people more distant than 5th cousins

Note that the 23andMe download shows you information about your match, but NOT the actual matching segment information☹

At Ancestry, I selected 4th cousin and closer and I received a total of 2698 matches. I could select “distant cousin” which would result in additional matches being downloaded and a different autoclustering diagram. I may experiment with this with my V2 account and compare them side by side.

This Ancestry information provides an important clue for me, because the matches I work with are generally only my Shared Ancestor Hints matches. If the Viewed field equals false, this tells me immediately that I didn’t have a shared ancestor hint – but now because of the clustering, I know where they might fit.

At Family Tree DNA, I selected 4th cousin, but I could have selected 5th cousins. I have a total of 1500 matches.

This report does include the segment information (Yay!) and my only wish here would be to merge the two downloads available at Family Tree DNA, meaning the segment information and the match information. I’d like to know which of these are assigned to maternal or paternal buckets, or both.

AutoClustering

The Autocluster csv file is interesting in that it shows who matches whom. It’s the raw data used to construct the colored grid.

My matches are numbered in their column. For example, person M.B. is person 1. Every person that matches person 1 is noted at left with a 1 in that column. Look at the second person under the Name column, C. W., who matches person 1 (M.B.), 2 (C.W.), 3 (T.F.), 4 (purple) and 5 (A.D.).

All of these people are in the same cluster, number 3, which you’ll see below.

The AutoCluster Graph

Finally, we get to the meat of the matter, the cluster graph.

Caveat – I experienced a significant amount of difficulty with both my account and my graph. If your graph does not display correctly, save the file to your system and click to open the file from your hard drive. Try Edge or Internet explorer if Chrome doesn’t work correctly. If it still doesn’t display accurately, notify GeneticAffairs at info@geneticaffairs.com. Consider this software release late alpha or early beta. Personally, I’m just grateful for the tool.

When you first open the html file, you’ll be able to see your matches “fly” into place. That’s pretty cool. Actually, that’s a metaphor for what I want all of my genealogy to do.

This grid shows the people who match me and each other as well, so a trio – although this does NOT mean the three of us match on the same segment.

The first person is Debbie, a known cousin on my father’s side. She and all of the other 12 people match me and each other as well and are shown in the orange cluster at the top left.

I know that my common ancestor couple with Debbie is Lazarus Estes and Elizabeth Vannoy, so it’s very likely that all of these same people share the same ancestral line, although perhaps not the same ancestral couple. For example, they could descend from anyone upstream of Lazarus and Elizabeth. Some may have known ancestors on either the Estes or Vannoy side, which will help determine who the actual oldest common ancestors are.

You’ll notice people in grey squares that aren’t in the cluster, but match me and Debbie both. This means that they would fall into two different clusters and the software can’t accommodate that. You may find your closest relatives in this grey never-never-land. Don’t ignore the grey squares because they are important too.

The second green cluster is also on my father’s side and represents the Vannoy line. My common ancestor with several matches is Joel Vannoy and Phoebe Crumley.

Working my way through each cluster, I can discern which common ancestor I match by recognizing my cousins or people who I’ve already shared genealogy with.

The third red cluster is on my mother’s side and I know that it’s my Jacob Lentz and Fredericka Ruhle line. I can verify this by looking at my mother’s AutoCluster file to see if the same people appear in her cluster.

You can also view this grid by name, # of shared matches and the # of shared cMs with the tester. Those displays are nice but not nearly as informative at the AutoClusters.

Scroll for More Match Information

Be sure to scroll down below the grid (yes, there is something below the grid!) and read the text where you’re provided a list of people who qualify to be included in the clusters, but don’t match anyone else at the criteria selection level you chose – so they aren’t included in the grid. This too is informative. For example, my cousin Christine is there which tells me that our mutual line may not be represented by a cluster. This isn’t surprising, since our common ancestor immigrated in the 1850s – so not a lot of descendants today.

You’re also provided with AutoCluster match information, including whether or not your match has a tree. I do have notes on my matches at Family Tree DNA for several of these people, but unfortunately, the file download did not pick those notes up.

However, the fact that these matches are displayed “by cluster” is invaluable.

You can bet your socks that I’m clicking on the “tree” hotlink and signing on to FTDNA right now to see if any of these people have recognizable ancestors (or surnames) of either Elizabeth Vannoy or Lazarus Estes, or upstream. Some DO! Glory be!

Better yet, their DNA may descend from one of my dead-ends in this line, so I’ll be carefully recording any genealogical information that I can obtain to either confirm the known ancestors or break through those stubborn walls.

Dead ends would become evident by multiple people in the cluster sharing a different ancestor than one you’re already familiar with. Look carefully for patterns. Could this be the key to solving the mystery of who the mother of Nancy Ann Moore is? Or several other brick walls that I’d love to fall, just in time for Christmas. Who doesn’t have brick walls?

By signing on to Family Tree DNA and looking carefully at the trees and surnames of the people in each group, I was able to quickly identify the common line and assign an ancestor to most of the matching groups.

This also means I’ll now be able to make notes on these matches at Family Tree DNA paint these in DNAPainter! (I’ve written several articles about using DNAPainter which you can read by entering DNAPainter into the search box on this blog.)

Mom’s Acadian Cluster

Endogamy is always tough and this tool isn’t any different. Lots of grey squares which mean people would fit into multiple clusters. That’s the hallmark of endogamy.

My Mom’s largest clustered group is Acadian, which is endogamous, and her orange cluster has a very interesting subgroup structure.

If you look, the larger loosely connected orange group extends quite some way down the page, but within that group, there seems to be a large, almost solid orange group in the lower right. I’m betting that almost solid group to the right lower part of the orange region represents a particular ancestral line within the endogamous Acadian grouping.

Also of interest, my Mom’s green cluster is the same as my red Jacob Lentz/Frederica Ruhle cluster group, with many of the same individuals. This confirms that these people match me and that other person on Mom’s side, so whoever in this group matches me and any other person on the same segment is triangulated to my Mom’s side of my genealogy.

You can also use this information in conjunction with your parental bucketing at Family Tree DNA.

In Summary

I’m still learning about this tool, it’s limitations and possibilities. The software is new and not bug-free, but the developer is working to get things straightened out. I don’t think he expected such a deluge of desperate genealogists right away and we’ve probably swamped his servers and his inbox.

I haven’t yet experimented with changing the parameters to see who is included and who isn’t in various runs. I’ll be doing that over the next several days, and I’ll be applying the confirmed ancestral segments I discover in DNAPainter!

This is going to be a lot of fun. I may not surface again until 2019😊

______________________________________________________________

Disclosure

I receive a small contribution when you click on the link to one of the vendors in my articles. This does NOT increase the price you pay, but helps me to keep the lights on and this informational blog free for everyone. Please click on the links in the articles or to the vendors below if you are purchasing products or DNA testing.

“This Ancestry information provides an important clue for me, because the matches I work with are generally only my Shared Ancestor Hints matches. If the Viewed field equals false, this tells me immediately that I didn’t have a shared ancestor hint – but now because of the clustering, I know where they might fit.”
?? Not sure this is correct? As with the other mass data download methods I’m used to this typically means that you have not viewed this match’s page (little blue dot is ‘on’). Perhaps you meant the hints column, which for me is not populated (but should be). I suspect there are little bits and pieces that will be fixed after this initial roll-out.

Feel free to simply correct the typo (if it is) and delete this comment.

That is awesome!! I am looking for one unknown parent. I have read through this, and feel I am in over my head. I have a good tree built and manage my mothers DNA. Do you think a Newbie could tackle this?
Thanks for any advice you can give. 🙂

Absolutely! And it’s SO much easier than the spreadsheet matrices I am used to building.

In fact, it makes the detective work fun because you can concentrate on families rather than tech stuff once you learn the basics. Search YouTube for Blaine Bettinger’s Intro video, grab the pdf document from the GA site, and .. enjoy!!

You are right to be cautious. The owner is Evert-Jan Blom and he is a contributing member of the Genetic Genealogy Tips and Techniques group and has been for some time. I can’t imagine anyone who wasn’t interested in the topic at hand would invest the huge amount of time in understanding and developing a complex tool where the target audience is small compared to other possible scam targets with a much larger audience and that would be a lot easier to focus on. I watch my credit card closely and can contest any charge. Everyone has to do what they are comfortable with.

That’s good to know that he’s a known entity and highly conversant in DNA genealogy. However, I am not aware of his business management skills or technical knowledge of securing credit card data and passwords. The community rants about incorporated entities like Ancestry, 23andMe having access to our data, but don’t question a newly created website with a cool tool. Caveat emptor.

@Bonnie B, I think I understand what your concern is, and now, even though I am SO excited about jumping into this, I am also concerned. @Roberta Estes, while I would trust Evert-Jan himself, if his site is not secure, then I think it means it is vulnerable to anyone who decides to hack it for its client information, no?

You’re completely right that the website is still accessible through http. However, the site can also be reached under https but I haven’t found a way to force the https. Rest assured, this only is the case for the frontpage which only hosts some static html files. The members section is https only. In addition I deal with some security related questions on my faq (https://www.geneticaffairs.com/faq.html), for example the credit card data which I don’t store myself but with another reputable organization (strip.com). If you have any additional questions, don’t hesistate to mail me info@geneticaffairs.com.

ABOUT SECURITY: To E.J. Blom and anyone else who is interested. I also have several http sites. Chrome decided to begin advising their browser users that http was not secure and could therefore be hacked. While true, this requires http web designers to add a lot of extra code that’s sometimes difficult.

My provider has suggestions on what to add to force my domains to be https but, as Dr. Blom also discovered, I haven’t found a way to make it work yet. Those of us who want to get info out there but who aren’t full-time web designers may have a challenge with this.

That being said, most people usually chose 3rd party web services to collect credit card payments so your credit purchases may be from a secure site where the original site was not. You can verify this by looking in the address bar to see if it says “http” (not secure) or “https” (secure).

I love that when I run the Auto-Cluster on one of my AncestryDNA test that the table beneath the Cluster map includes my notes. HUGE time-saver for those building genetic networks. I do a shabby job of notes on my matches at FTDNA. I guess I rely on the Paternal/Maternal Sort Icon but I definitely have room for improvement in that area at FTDNA.

btw, there is also on option to make single payments sporadically rather than signing up for a subscription, if that works better for some people. I was not able to update the site with my credit card using Chrome, but it worked on Firefox. There’s an option to change to a different credit card, but I haven’t (yet) found an option to delete my credit card info off the site.

I’ve got 7100+ matches on FTDNA, but the strongest matches just reach 175 total cM or 35 cM longest block. No cousins at all and just a few third cousins tested. Do you think this tool is worth a try in this case? The threshold values are so much higher.

“the developer is working to get things straightened out. I don’t think he expected such a deluge of desperate genealogists right away and we’ve probably swamped his servers and his inbox.”

Yes! And hopefully, more entrepreneur programmers will realize genealogy is the world’s biggest hobby and “there’s gold in them thar [genealogy] hills” and will start designing apps to help all the millions of hungry genealogists organize and make sense of all their massive amounts of data. Most genealogists will pay good money for a clever app that works and does the job.

I have a question about vendor accounts. My brother tested at Ancestry and transferred to FTDNA (and other places genetic affairs doesn’t connect ATM). I tested at FTDNA and transferred to other places, too (gedmatch, myheritage). I haven’t tested at Ancestry, but probably should – to improve chances of finding people.

We are full brothers, but also have slightly different match lists as expected due to the uniqueness of inheritance (i.e. – we didn’t get the same 50% from each parent).

Is there any reason I shouldn’t use genetic affairs to check my brother’s account(s) and my account(s)? I’m interested in relatives my brother matches even i don’t match, because we still share a common ancestor, right? By synthesizing a new dataset of dna matches that consists of both of our matches, am I creating something inaccurate? If i wanted, i could run reports for just my brother, just for me, and then a combined one for both of us?

Let me know if I’m missing something or getting something wrong. There’s this huge group of people my brother matches on one chromosome, that I don’t match, but the dna block is rather large (20-30 cM) on average, and I’d love to see if any of my matches match them or not.

You can see which people match both of you and that will identify the common groups between you – and identify the ancestral group if you know how those people are related. So no, no reason not to utilize this tool.

DNA Painter has added a tool that will take the cluster information from the html page and create a csv file with it. You can do an import for all you matches at FTDNA then using the csv file find where one of the matches in a cluster matches you (using the chromosome browser). Find that match on DNA painter and edit the match to change the group to that cluster (create a new group). DNA painter will give you a list of people that the match also matches. You can click on one of the other cluster members listed and change their group to the cluster. This way you can have each cluster painted. When you know which line the cluster belongs to you can edit the group information to show that.

I have just started this process so not sure how well it will work but it seems a good way to start painting chromosomes. You can slowly narrow in on the MRCA.

Thank you so much for the detailed instructions, Roberta. I had trouble opening the cluster files with Chrome but tried IE as you suggested and it worked! I had been reading about this on GGT&T but kept putting it off. Your guidance got me through it. 🙂

Clusters are either by cM totals or longest segment. Ancestry doesn’t tell you where that match occurs, but they do tell you how big it is. Then Genetic Affairs looks at who you and your match match in common to form clusters of people who match each other.

It helps to understand how inferred and generalized it actual is. We use shared match reports now and know that it is hint but can be misleading if one doesn’t explore on what chromosome segment and/or family branch it may be. It’s great having another tool to group things differently though.

I finally got around to checking this out. I ran analysis for me(FTDNA only) and my brother(Ancestry & FTDNA). What I found extremely interesting is how one of my dna clusters got split into two groups. I know for a fact both clusters detected by genetic affairs trace to the same common ancestor. The 2nd cluster (same ancestor) represents all the descendants who stayed and married and had children in Virginia to modern day. In fact, i think they are being clusters together because they have another unique common ancestor separate from group 1. Or more simply, one cluster are the descendants outside the state, and the other cluster are the descendants that are still in Virginia (maybe even the same couple of counties).

I ran my initial reports using the “A” method analysis for auto-clustering. I’m using “C” next time and have customized the reporting parameters. This will be very helpful for my brother’s results on Ancestry, because there is a huge group of people that share ancestry and the matching cMs are between 20-50 cM. I’ve done the research and verified we all trace to the same couple. What I’m really interested in is who else might tha cluster include that I haven’t discovered yet.

I also have two matches missing from a suspected cluster(grey square, not colored). Actually, the suspected cluster doesn’t show up well (or at all) on FTDNA because there aren’t enough people from that ancestor at FTDNA that match in a certain spot on CHR 11. The few that do trace from the common ancestor, don’t form a cluster. My two persons also show up in a well known cluster I noticed two years ago on a different chromosome but I have no idea how I match them (well – I know what side of the family they are on and I know who they *don’t* match!). I call them my CHR 9 cluster.

What I would really like is a synthesis of auto-clusters based on me and my full brother. We both have clusters the other don’t have and vice versa because of unequal dna inheritance.

I can where this is leading. There is something I haven’t done in my research and that is the full download of all our matches and the matching segments and I’m going to have to synthesize my own set of clusters trying to use mine and my brother’s DNA results. Spreadsheets and DNA painter?

The odd thing about this one dna cluster on chr 9, is the number of people that seem to trace back to Halifax, VA and surrounding regions. And the only clues I have aren’t enough.

I reduced the minimum cM matching threshold and expanded it to look as far as 5th cousins.

The result was something I thought I saw two years ago. In my previous post I said that using the default option “A” split people up into separate groups that I thought were really 2, or even 1 group.

The people that got split into two different groups on CHR 11, as well as people on CHR 9 (Halifax Cluster), show up as a single cluster.

I can see where biases could creep into one’s analysis, especially when a tool like this seems to be telling you what you hope is true.

Bottom line, is that I think the people that match on the different chromosomes are related to one or both of a specific Mr. and Mrs. 3x great grandparent.

Has anyone encountered the scenario where running option A breaks up a group you thought you saw, but when widening the matching parameters under option C it pulls them back into a single group? Bias Alert: I really hope(want) this group to correspond to relatives/cousins of this one couple.

WARNING: when i expanded my parameters under option C I got back 1500+ matches. I crashed my browser a few times trying to look at a 1500×1500 matrix. But I also get back a csv file that labels people against their respective cluster, which is easier to manage.

I went back and read this thread, and I don’t see what’s confusing you. Could you be more specific? Have you tried the tool? That might help if you haven’t. You have to select the segment sizes to use, which is something everyone is messing around with and isn’t straightforward. I just use the default although I could probably get better results if I spent time figuring this out. People may be going back to look at matching segment. I do that. I use this tool at a hint as to where to look for more details. So I’m not sure where the confusion is. I hope this helped at least a little.

Mmmm! GeneticAffairs may be all right – if it ever works for me. I set up my trial account all right so as to access all three vendors. Went easily but then:

I cannot access Members Login, access members front page, contact their support, or anything else. I either get a HTTP error 504, or else a blank screen – and I have to wait a couple of minutes just for that doozy.

Tried four different browsers – Safari, Chrome, Opera, Firefox – to no avail. Perhaps GA just doesn’t like Macs.

Not your problem of course, Roberta, but is anyone else having this or similar issues?

Would this Genetic Affairs site work for me? This is what I am trying discover:

1.) My surname is Hale, but my Y-DNA test says I am an Aker not a Hale. A Hale cousin I found through ancesry research said her mother told her children they were not Hale, they were Aker. it seems an Aker was taken in by a Hale family in the 1800’s according to her mother. Will this Genetic Affairs help me find my Aker connection?

2.) According to my family tree research and DNA matches at FTDNA I have a Native American ancestor. Will this Genetic Affairs confirm my family tree Native American ancestor?

3.) According to National Geographic’s Ancestry, Geno 2.0 I have a DNA match with Queen Victoria and Richard III, I have plotted this in my family tree and it seems correct. Will this Genetic Affairs confirm this?

How much a I looking at as far as cost to get Genetic Affairs to confirm these 3 areas of concern? Thanks for any insight and help.

What do the gray squares to the right of the clustering area mean. Are they people who match the tester taker and no one else? I referring to the names on the axis of the chart who are not represented (named) in the vertical axis of the chart.