Shelley Crawford first documented the methods to create a genetic network graph using Excel and NodeXL. See her method beginning at Visualising Ancestry DNA matches–Part 1–Getting ready. Recently, I discovered Shelley's company ConnectedDNA selling charts she creates using custom code, Gephi, and data the customer supplies (or, optionally, AncestryDNA data that Shelley downloads using ‘viewer’ permission).

This is exciting! I can justify paying someone else to do the hard work for me while I finish my other tasks. I can make time to play with the graphs created by someone else. Thank you, Shelley.

The first part of this post (Product 1) is similar to what others have posted on their graph analysis. The second part of this post (Product 2) is something you may not have seen yet—graphs created using half sibling matches.

Product 1. Single profile graph using AncestryDNA data

My first purchase was a "Single profile graph" using one person's AncestryDNA data.

I used the DNAGedcom Client to download my matches and ICW files from AncestryDNA using the options recommended by ConnectedDNA on the order page. After placing my order I received instructions on how to provide the files to ConnectedDNA.

After processing by ConnectedDNA, I received PDFs and images of two graphs—a Network Map and a Group Map—and a Match file spreadsheet (XLSX) file. The spreadsheet contains information added by ConnectedDNA beyond what was in the DNAGedcom Client file.

Network Map Graph

At a shared match size threshold of 30 cM, the Network Graph includes 1,006 people (each represented by a colored dot on the graph), with 13,556 lines between them. An image of the full graph is shown here with no names displayed.

Names are visible when zoomed in on the PDF. The name labels are blurred here on this small section of the graph extracted from the group of brown dots in the upper left of the image above.

The PDF is searchable allowing me to locate matching test takers by name. The size of the colored dots is based on the amount of shared DNA. Bigger dots are likely more closely related. Each line represents a relationship (shared DNA) between two people.

I can label the ancestral lines containing the names of matches where the common ancestor has been identified. It is likely that I share that same line with others in the same group (same color). This is a clue as to where to focus research for a common ancestor for matches in this group. I can annotate the graph with the ancestral surnames associated with each colored group as shown in the image below.

The groupings can lead to genealogical discoveries. The "Anderson-McSpadden" cluster is purple. Some purple dots are outside of the main cluster and have lines linking them to the "Johnson" and "Richards" clusters. These outlier circles are marked as "Johnson-Richards and Anderson-McSpadden" and represent people that I share two ancestral lines with. A pair of Richards siblings married two Anderson first cousins. There is cross-over here in shared DNA between my maternal grandfather’s line (Anderson-McSpadden) and maternal grandmother’s line (Johnson-Richards) that is clearly reflected in the graph by the "Johnson-Richards and Anderson-McSpadden" outliers. If I did not know of this cross-over this would be a great clue for me.

The connecting lines are hard to see on this reduced image, but are clear when viewed in the PDF and can clearly be seen in the Group Map below.

Group Map Graph

The Group Map is a summary version of the Network Map chart. It is both a finding aid for the numbered groups, and illustrates the relationships and strength of connection between the groups. I added surnames and boxes to this version of the graph. The groups with a named common ancestor clearly split into maternal (green box) and paternal (blue box) lines. This is a clue as to whether the unnamed groups are more likely on the maternal or paternal side.

The general locations of groups and the assigned colors are the same as in the Network Map Graph. The connecting lines are simpler in this Group Map graph. The lines represent links between clusters or circles. Thicker connecting lines represent more individuals in one circle with connections in the second circle. For example, a Parker ancestor (blue circle, group 3) married a Rogers ancestor (red circle, group 2). There is a thick line between these two circles indicating many matches in the two circles share DNA. This is expected; many test takers share both of these ancestral lines with me—they are also descendants of this Parker-Rogers ancestral couple.

Match File

The Match File is a modified version of the file provided to ConnectedDNA. The addition of the group allocations (with colors as seen on the graphs) gives a quick visual clue of the likely relationship. It is easy to find surnames of interest.

Clues also lurk in this file. My Maples line is one that is not traced past the first ancestor with that surname. My Maples ancestor married a Parker. The match here who has Parker and Maples in their surname list would be a good place for me to begin investigating links to identify Maples ancestors. Another great clue as to where to look for the common ancestor is seen in Group 18 which is my Johnson line. The Group 18 match in the image below does not have Johnson in the surname list, but Parrott ancestors are further back in my Johnson line. This is likely how I am related to this match. Jarvis is another surname in my family tree. I can investigate if we also share Jarvis ancestors or if this person has a different Jarvis line from mine.

The spreadsheet file can be searched, sorted, or filtered by data found in each column using the drop down box in the column header. This allows me to focus on a subset of matches of interest, eliminating clutter from the other lines. For example, filtering the Group column for 3 displays only the matches that are in group 3 (blue circle). Clicking the drop-down funnel allows the filter to be cleared.

The image below shows the Match List filtered for Group 3 which is my Parker paternal line. Henry Parker married Nancy Black and I see Parker and Black in the surname lists of most of these matches. Haynes is a surname a few generations further back in this Parker line. Those other surnames listed for each match may be their lines not related to me or could be clues for the spots in my lineage where I have not yet identified an ancestor.

My ethnicity is always 95-98% European so the ethnicity estimates are not generally much help to me. However, anyone with recent ancestral origins in a specific biogeographical region might find this information helpful. For example, if one grandfather’s lineage is African American or Native American then any such ethnicity predictions for a match is a clue the relationship might be on that grandfather’s line.

Even if you do not prefer visual representations of data, the Excel file with the groups added to allow filtering the included matches may be worth the cost. Having the links to go directly to a match's tree or profile on Ancestry is also a time saver.

As with all non-exact searches, be aware that filtering the Surnames column for Ryan displays all of the matches who have a surname that contain the letters ryan. This includes surnames Ryan, Bryant, O'Bryan, and so on.

Product 2. Close family graph (Family Tree DNA)

My second purchase was a graph using FamilyTreeDNA data for five siblings. The standard offering at the time was for full siblings (that is the price I paid). I asked Shelley if she would consider an option to include a half sibling. She agreed to use me as a guinea pig and now offers products with siblings, close family including half siblings, and extended family. See her website for details (https://www.connecteddna.com/). Again, I used DNAGedcom Client to collect data for the siblings. I then supplied the match list, ICW file, and chromosome browser file for four full siblings and one half sibling to ConnectedDNA.

After processing by ConnectedDNA, I received PDFs and images of two graphs—a Profile Graph and a Group Graph—and a Match file spreadsheet file with information added by ConnectedDNA beyond what was in the DNAGedcom Client file.

Profile Graph

These are the Summary statistics for the unfiltered combined FamilyTreeDNA files sent to ConnectedDNA:

When the data is graphed unfiltered it is too much of a blob to be meaningfully interpreted.

Shelley uses her expertise to filter the data so the graph becomes meaningful. The thresholds used for each data set varies—your thresholds may differ from the ones used for my data. Shelley used the chromosome data to identify shared-match pairs who also have at least one overlapping segment of DNA with the focus person. She then filtered the matches to those who share between 50 cM and 1,300 cM with at least one of the five profiles. The closest matches are excluded (people I asked to test and who are related to me on almost all lines). She filtered the connections between matches so that, for matches who share at least 130 cM with at least one kit, all connections are shown.

For pairs of matches below 130 cM, connections are only shown if there is also an overlapping segment of about 12 cM or more between the shared matches and one of the kits. Connections where there is an 'overlap' are darker than connections where there is not. As Shelley states clearly, "This is not triangulation, but works as a reasonable proxy since true triangulation data is not available." True triangulation is not available from the data supplied by FamilyTreeDNA, but may be when using third-party tools.

These siblings all share a mother. The half sibling has a different father. Three groups of matches emerge:

Half sibling only - people likely related on the half sibling's paternal line (red color)

Half and full siblings - people likely related on the shared maternal line (green color)

Full siblings only - people likely related on the full sibling's paternal line (blue color)

This reveals likely paternal or maternal links and some matches that need investigation. Several clusters contain circles of primarily one color but with a small number of red circles that represent matches to the half sibling only. The completely red clusters are likely the paternal line of the half sibling. A mix of red with green-blue circles may indicate the half sibling inherited a segment that the full siblings did not. The mix could also indicate the half sibling’s paternal line has ancestors shared with the full siblings’ line. Investigation is required.

Match Group Overview Graph

The Match Group Overview graph shows how the groups link together and the numbers assigned to each group. This is similar to the AncestryDNA Group Map described above. The colors and group numbers match those used in the FamilyTreeDNA Match List spreadsheet file.

Group Network Graph

The Group Network Graph includes one circle for each match in a group. This details the matches included in the circles of the Match Group Overview Graph. Just as with the AncestryDNA product, the PDF file has searchable names attached to each circle. Identification of the shared ancestor with one person in a group provides a clue for the likely shared ancestor with others in the group.

Match List

The spreadsheet Match List for this product includes match ID; full name; relationship range and predicted relationship; a list of close ICW matches; whether the match is full sibling only, half sibling only, or both; group number; number of shared matches; longest DNA segment shared; total shared cM; total shared cM with each sibling; ancestral surnames; Y-DNA and mtDNA haplogroups (if included in the DNAGedcom Client data files); notes; and email address for each match. Each column can be filtered to focus on the group under investigation.

Be aware that group numbers are assigned as each ConnectedDNAproduct is created. This FamilyTreeDNA product is completely separate from the AncestryDNA product discussed above. Therefore, group numbers assigned are different in the two products. The matches in Group 3 of my AncestryDNA product are in Group 21 in my FamilyTreeDNA product. I can easily determine both of these are my Parker paternal line based on matches to people who tested at both AncestryDNA and FamilyTreeDNA.

Some General Guidance on Interpretation

The graphs and spreadsheets provide clues that require investigation before conclusions can be reached. Researchers must realize that random recombination may result in some people who sort into one group genetically when they may be in another group genealogically. This is especially true when a half sibling inherited a segment of DNA from the shared parent that the other siblings did not inherit. Take care not to misinterpret these cases based on erroneous assumptions. In general, a known relative in a group is a strong clue to which part of your family tree that group represents. Random recombination may cause some of these clues to be misleading.

Shelley Crawford gives this good advice on how to use these files:

Some groups may be a mystery to you. Make your way through the matches in those groups, reviewing their trees and looking for common elements. Is there a surname or a place that appears in several of the trees? With a little research effort, you may be able to expand upon the information your matches have provided and find a common ancestor.

I am excited to explore these files more fully and make discoveries to add to my family tree.

Edited 13 January 2019: Modified phrasing to correctly reflect permissions used when the customer asks ConnectedDNA to download the data.

All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.
To cite this blog post:
Debbie Parker Wayne, "ConnectedDNA Graphs and Clues," Deb's Delvings, 12 January 2019 (http://debsdelvings.blogspot.com/ : accessed [date]).

30 July 2018

Texas State Genealogical Society (TxSGS) has announced the conference schedule for San Antonio on 2-4 November 2018. This year celebrates 300 years of history since the founding of San Antonio with three days (and over 3,000 minutes) of great genealogy education!

This year I will be presenting a two-hour workshop on autosomal DNA analysis from 1:30 to 3:30 Friday afternoon. This workshop has limited seating available and an add-on cost of $30.

I will be presenting "Organizing Genetic Genealogy" at 11:00 on Saturday and "Documenting DNA Analysis" at 2:00 on Saturday. I am scheduled before and after lunch; it will be a busy mid-day on Saturday.

Our plan is to unveil our new Early Texas DNA Project website at this conference. I will be answering questions and featuring the website at a TxSGS booth when I am not speaking.

There is something for every researcher at every knowledge level. I hope to see you there.

Debbie Parker Wayne will receive remuneration as a speaker for this conference and is a board member as the DNA Project Chair.

All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.
To cite this blog post:
Debbie Parker Wayne, "Texas State GS 2018 Annual Conference Schedule," Deb's Delvings, 30 July 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

21 July 2018

More people are jumping into DNA testing and genetic genealogy who are not experienced in DNA or genealogy before taking that first DNA test. Joining a social media group or a mail list or forum provides exposure to many programs and tools, terms, and techniques that make it seem like a fire hose is aimed at you at full blast.

It is great to jump in. It is great to ask questions to learn. But you never know how much the person answering you knows. And they may not even know they are giving you information that is not completely accurate because they misunderstood your question.

Below are some places to (1) learn more about DNA and (2) get better help when one of the DNA tools does not work as you expected.

Start small when learning something new and build up to higher levels. This applies to studying DNA using the recommendations above and to learning new tools.

When learning a new tool or process test first with a small dataset. For example, when I first downloaded the version of Progeny Charting Companion that creates DNA analysis charts, I created a small RootsMagic database with only four DNA test takers and the direct lines back to their shared ancestors (as shown in the chart above). I created a dummy CSV file with the minimum amount of data needed for those test takers' DNA data (as defined in the Charting Companion's help files). I used this small dataset to play with the charts offered by Charting Companion until I understood how the options worked to get the output I desired. Once I was comfortable using the tool I then accessed my full RootsMagic database after adding the new facts needed for DNA charts to work properly (like DNA kit numbers for each test taker).

After you begin using a new tool, it may not always work as expected and you may need help. To get better help when one of the DNA analysis tools does not work as expected (most of this applies to any program or app)

read the instructions (built-in help files, a user's guide, how-to instructions on the program's website)

really read the instructions—do not just scan them—and be sure you followed every step carefully, including the steps that are linked into or referenced from the first help page you access (most problems are due to not following instructions; trust me on this, I worked tech support and trained computer users for much of my "life before genealogy")

if you followed the instructions carefully and still have problems, make note of any error messages displayed (or failure mode) and step-by-step what you did just before the failure or error

use Google or another engine to search for the error message or failure mode (if the program uses Facebook to offer technical support, use Facebook's "Search this group" feature)

if potential solutions are found try them

if no solution is found by searching or the solutions found do not work for you, then post a message asking for help; include

the tool name and version of the tool you are using (also indicate if you recently updated the tool)

the error message received or exactly what you saw that was not "right"

the step-by-step list of what was done before the error message was received or the program failed

whether you are using a Windows, Mac, Android, iOS, or other device and the version of that operating system

whether this is something that worked in the past or this is your first time to try this procedure

These recommendations should help you get better technical support and help you learn new programs and DNA analysis more productively.

Update 23 July 2018: Fixed minor typo, added NGS online training courses, and added to disclaimer royalties for courses and books.
All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.

Debbie Parker Wayne receives royalties for the NGS course she authored on autosomal DNA analysis and books for which she is an author or editor.
To cite this blog post:
Debbie Parker Wayne, "Learning DNA and Getting Help with Analysis Tools," Deb's Delvings, 21 July 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

Progeny just released Charting Companion version 7 with a major addition to help during DNA analysis as described in his announcement below (URLs were changed to go directly to the Progeny website and not to the advertising site in the email sent to me so as not to mess with stats from email accesses).

Charting Companion 7 features a new technology to help place adoptees and orphans in a family tree: the DNA Simulation. Based on the DNA Matrix, the DNA Simulation will construct a Descendant tree, then will systematically try to link the "orphan" to every person in the tree, one at a time. Charting Companion will validate the tree by calculating the expected centiMorgan (cM) implied by the hypothetical relationship, and comparing it to the actual laboratory DNA test results. Each iteration is called a "scenario". If the DNA test results are outside the cM range, the scenario is bad, will be discarded, and Charting Companion will advance to the next possible position of the orphan in the tree. If the DNA results are consistent, the good scenario will be recorded. All possible scenarios can then be reviewed for further investigation. (see video [at https://youtu.be/yBe6Pd8g5no]).

In addition to linking to existing persons, Charting Companion will also insert hypothetical or placeholder spouses and children, and attempt to link the orphan to these additional people. The added persons represent potential extramarital relationships, previous unknown marriages, unknown children, children given up to adoption, non-paternal events, etc. They are meant to suggest possible connections that would otherwise be very time-consuming to evaluate manually.

All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.
To cite this blog post:
Debbie Parker Wayne, "DNA Simulation added to DNA Matrix in Progeny Charting Companion," Deb's Delvings, 21 July 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

23 May 2018

20 June 2018: Updated to link to new page for the survey on BCG website.

This is part of a series on Genealogy Standards for using DNA. This series represents the opinions and interpretation of the proposed standards by this author and does not necessarily reflect BCG’s official position. The proposed standards are not being addressed in numerical order, but all articles will be linked. For other parts in the series see

You can participate in a survey and provide your opinion on the Proposed DNA Standards through a Google Docs survey linked from https://bcgcertification.org/proposed-dna-standards-for-public-comment/. Please leave comments by 23 July 2018 explaining your agreement or disagreement with the proposed standards. Comments will be used to modify the standards as needed before acceptance and publication.

What does this all mean? Standards are written formally and most of us understand informal language better. Breaking down each segment makes the meaning more clear.

The first thing many researchers who share DNA do is compare pedigrees (family trees) or surname lists looking for a common ancestor (or an ancestral couple; the term common ancestor will be used for simplicity even when an ancestral couple may be the source of a DNA segment). The first common surname, person, or couple found is often assumed to be the source of the shared DNA segments. That assumption may be right or wrong. More evidence is needed to determine which is more likely.

Genealogists must also consider that two test takers who share DNA may not have inherited all of that DNA from one common ancestral couple. If two test takers are related in more than one way (such as through pedigree collapse or endogamy) this can be difficult to determine except with thorough research and correlation of the DNA and documentary evidence.

When analyzing pedigrees there are three critical concepts. Some common things to review when analyzing pedigrees are listed here.

Accuracy of the pedigree: a pedigree either has the correct ancestors linked for each generation or it does not. If the pedigree of any DNA test-taker under analysis is inaccurate then the common ancestors may never be identified.

Accurate pedigrees are the result of research that meets the Genealogical Proof Standard (See "Useful References" below; the GPS summarized and paraphrased is): A focused research question, thorough research, correctly cited sources, thorough and competent analysis and correlation of all evidence that is pertinent to the question, resolution of any conflicting evidence, and a sound written conclusion).

Researchers can analyze the accuracy of pedigrees by confirming the consistency of assertions (no children born when a parent would be too young, too old, deceased for more than nine months, in a different location at the time of conception, etc.) and that the most credible sources support each assertion.

See “Accuracy” at the bottom left of the pedigree image.

Depth of the pedigree: ideally, each DNA test taker’s pedigree chart should be complete back to the level of the hypothesized common ancestor, and preferably a few generations further back. If two DNA test takers are predicted to be third cousins, then both pedigrees should be complete at least back to the second-great-grandparents (the hypothesized common ancestral level). An extra generation or three in each tree helps if the test takers inherited more than the statistical average amount of DNA; in that case they may actually be fourth or fifth cousins instead of the predicted third cousins.

See “Depth” at the top left of the image. In this example, all names are complete up to the great-grandparent level that would be shared with second cousins. However, all of the missing information on the birth, marriage, and death of many of these ancestors indicates this tree is not deep enough or verifiably accurate enough even at this level.

Gaps in the pedigree: ideally, each pedigree will be complete with no gaps. In the real world many researchers have brick walls on some lines or just have not had time to research every possible line yet. Add to that the fact that every time a new ancestor is identified the next step is to identify that ancestor’s parents making genealogy truly a never-ending search.

See “Gaps” at the top right of the image. Those gaps in the tree may be hiding the common ancestor or perhaps a second (and third, fourth, and so on) common ancestral line shared by two DNA test takers. Our conclusion may be easily overturned if we do not consider those other possible shared ancestors. We can address the gaps by one or more of the following

Doing further documentary research to fill in the gaps—we would want to do this eventually as we work on our pedigree, but a specific DNA match may focus our research on a specific line now

Target test more cousins, or find more test takers in our match list who share the same ancestor, to gain more DNA evidence to support the conclusion—in some cases (like burned counties) there may be little to no documentary evidence to be found. DNA evidence may help answer the question, but more than two or three DNA test takers will be needed to credibly support most conclusions

Clear explanations may justify a conclusion that a gap is irrelevant to the research question—perhaps the pedigree gap is in a line that originated or resided in a locale that is irrelevant to the focus question, or it is a line with a different biogeographical origin, or the gap is so far back in the pedigree it is not relevant based on the DNA evidence, and there are other possibilities

Segment triangulation does not work in every situation, but when it exists it can be strong evidence—all cousins will not share every triangulated segment, but groups of cousins may share one triangulated segment, while some of those cousins may also share segments with cousins in a different group—showing how each of the groups overlaps may support a conclusion

Clustering and genetic networks work in a similar way to triangulated segment groups. Many names are used for clusters or networks: shared matches, in common with groups, DNA circles, matches who share DNA with both of two kits, and more—for example, a group of cousins share DNA with each other, a second group of cousins share DNA, and there may be some cousins who are in both groups providing a link to the common ancestor

I have held Certified Genealogist® credentials from BCG since September 2010. I helped form the BCG Genetic Genealogy Committee to discuss DNA standards. I resigned from the committee due to personal commitments, but have continued to participate as an adviser, reviewer, and in other ways.

I support the adoption of standards to be used when incorporating DNA analysis into a genealogical conclusion. I support BCG seeking input on the proposed standards from the greater genealogical community using DNA. I see this as a positive step to ensure newly adopted standards will meet the needs of the entire research community. No matter what is adopted, updates will certainly be needed just as research methodology and documentary research standards have evolved over the decades.

All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.
To cite this blog post:
Debbie Parker Wayne, "DNA Standards - Pedigree Analysis (Tree Analysis)," Deb's Delvings, 23 May 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

This is part of a series on Genealogy Standards for using DNA. This series represents the opinions and interpretation of the proposed standards by this author and does not necessarily reflect BCG’s official position. The proposed standards are not being addressed in numerical order, but all articles will be linked. For other parts in the series see

Way back in 2013 an Ad hoc committee formed to develop genetic genealogy standards. Those standards were released in January 2015 and are available at http://geneticgenealogystandards.com/.1 These standards are recommended by many organizations and by most speakers covering DNA topics

Those original standards primarily deal with ethical issues. The plan was to eventually add technical standards with more details on depth of testing, resolution of tests, and many other critical elements of using DNA test results to answer genealogical questions. As with so many other things, life got in the way and the additional work was never completed.

In the intervening years, we have learned a lot more about using DNA test results effectively and how varied and "random" the results can be from one family to another. Real life results do not always match the statistical average predictions. By definition, an "average" is the typical result in a data set, but that means there are real results on either side of that average. This leads to many questions. How many men need to be tested in a Y-DNA line to prove or disprove a theory? How many markers should be tested? How many markers can differ? How big should an X-DNA segment be before you spend time searching for the common ancestor who passed it down to the people living today? There is no definitive answer to these questions. Many variables will affect the answer for a specific family under investigation although there are some general guidelines to consider.

Many of us think we need defined standards for using DNA evidence to reach a genealogical conclusion even though there is no "magic number" answer to many questions. What should a thorough researcher do when incorporating DNA evidence into a genealogical conclusion? What do you look for other than the name of the same ancestor when analyzing another person's family tree? How do you document the analysis?

Years ago researchers had similar questions related to documentary research. The community responded with books to provide guidance to researchers. A selected list includes Genealogy as Pastime and Profession in 1930 and revised in 1968,2Genealogical Research: Methods and Sources in 1960 and revised in 1980,3Genealogical Evidence in 1979,4Genealogical Standards of Evidence in 2010,5 and Elements of Genealogical Analysis in 2014.6

The Board for Certification of Genealogists (BCG) published The BCG Genealogical Standards Manual in 2000.7 This was reorganized, updated, and published as Genealogy Standards in 2014.8 These standards reflect best practices for the genealogical research community, not just those applying for BCG credentials. Some genealogists think these standards are all we need—that we do not need more specifics for DNA.

My colleague, Harold Henderson, CG, makes an excellent point as to why DNA standards should also be spelled out (paraphrased and used with permission): A highly competent genealogist would be able to formulate standards based only on the elements of the Genealogical Proof Standard (GPS).9 By expanding the concepts of the GPS into the Genealogy Standards, BCG saved time for us all. Each researcher can understand the fine points of performing quality documentary research without having to recreate the standards. Defined DNA Standards provide the same service for those seeking to incorporate DNA analysis.

DNA standards will help members of the general community

Researchers adding DNA analysis to their skill set

Authors incorporating DNA evidence

DNA test takers and those requesting others to take tests

Instructors teaching others to analyze DNA test results

DNA standards will also provide benefits for BCG

Applicants and those renewing credentials will know what is expected when incorporating DNA

BCG judges will all be judging to the same published standards for DNA

Updated Genealogy Standards will reflect the current state of research (we have been using DNA for genealogy for over twenty years now and testing has increased exponentially in recent years)

The BCG Genetic Genealogy Committee has drafted a set of DNA Standards that reflect the practices of some of the most experienced genealogists using DNA today. BCG is surveying the community for input on these proposed standards. Some current Genealogy Standards are modified and expanded to more clearly define the needs when using DNA. New DNA Standards address DNA testing, interpreting DNA test results, identifying shared ancestry, accessing test results, and integrating DNA and documentary evidence. These standards are focused to provide specific guidance yet broad enough to allow for differing family composition and random factors encountered with DNA.

You can participate in the survey and provide your opinion through a Google Docs survey linked from https://bcgcertification.org/proposed-dna-standards-for-public-comment/. Please leave comments by 23 July 2018 explaining your agreement or disagreement with the proposed standards. Comments will be used to modify the standards as needed before acceptance and publication. There is also a link from which you can download a PDF file with the proposed standards.

Feel free to leave comments here, but only comments submitted through the official portal above will be considered by the committee.

I have held Certified Genealogist® credentials from BCG since September 2010. I helped form the BCG Genetic Genealogy Committee to discuss DNA standards. I resigned from the committee due to personal commitments, but have continued to participate as an adviser, reviewer, and in other ways. I support the adoption of standards to be used when incorporating DNA analysis into a genealogical conclusion.

I support BCG seeking input on the proposed standards from the greater genealogical community using DNA. I see this as a positive step to ensure newly adopted standards will meet the needs of the entire research community. No matter what is adopted, updates will certainly be needed just as research methodology and documentary research standards have evolved over the decades.

All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.
To cite this blog post:
Debbie Parker Wayne, "DNA Analysis Standards," Deb's Delvings, 23 May 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

15 May 2018

I am investigating bioinformatics1 tools to analyze Whole Genome Sequence (WGS) data. I have access to a WGS for someone who has also tested at several genealogy testing companies. I want to do some comparisons between the raw data from the genealogy testing companies and the WGS, checking for accuracy of the reads. To satisfy my curiosity, I plan to investigate some of the medical implications and traits discussed in scientific papers.

Once I have multiple WGSs from relatives, I plan to do some comparisons as to whether segments that the testing companies indicate match really do match completely with the higher resolution data. I am interested in how closely the statistical predictions on linkage disequilibrium and crossovers mirror what is seen in real family multi-generational studies. For example, in the shared segments marked below, not every SNP is tested. A number of SNPs in a segment are tested and we assume the non-tested SNPs match based on statistical predictions.

By the way, just as with some of the best genealogy articles, the reference notes in this article led me to several additional sources I now need to consult.

As a woman, this sentence is especially depressing: "... the proportion of female contributors decreases for high-profile repositories and with seniority level in author lists".2 I hope this changes and more women participate in bioinformatics.

I am impressed with how many databases and tools are out there for DNA analysis. I did not realize there are over 1,700 bioinformatics repositories and "23 'high profile' GitHub repositories containing source code for popular and highly respected bioinformatic tools."3 "Our analysis points to simple recommendations for selecting bioinformatic tools from among the thousands available."4 Some of these will not be useful for genealogy, but some will.

One tool aimed at the genetic genealogy community is Thomas Krahn's tool for annotating a BigY VCF file and identifying derived and novel SNPs.5 Thomas kindly shared this tool so others can do the analysis instead of having it done by his company YSEQ.net.

Some of the discussions in the scientific world parallel those we are having in the genealogy world.

"In recent years, the explosion of genomic data and bioinformatic tools has been accompanied by a growing conversation around reproducibility of results and usability of software. Reproducibility requires that authors publish original data and a clear protocol to allow repetition of the analysis in a paper."6 In the genealogy world we are discussing publicly available DNA data, such as on GEDmatch.com, allowing DNA analysis to be reproduced and referenced from a publication.

"The bioinformatics field embraces a culture of sharing — for both data and source code — that supports rapid scientific and technical progress."7 In the genealogy world we are discussing privacy issues versus sharing data, especially with the recent proliferation of stories on law enforcement use of genealogy databases.

I have been musing on whether to learn Python or Ruby. A recent discussion with a young programmer had me leaning towards Python. Since the "greatest amount of code in the main dataset was in Javascript, followed by Java, Python, C++, and C"8 maybe I will stay with Javascipt and Java, which I already know, if I develop any new tools for web usage. I have a few tools I wrote in Perl for my own use that I hope to clean up and share eventually.

In addition to DNA adding to my knowledge of my family tree, it is forcing me to upgrade my data analysis knowledge and computer tools familiarity. I hope all of this study helps keep my mind active and reduces those "senior moments" that seem to occur more frequently with the years.

1. The science of collecting and analyzing complex biological data such as genetic code.2. Pamela H Russell, et al., "A large-scale analysis of bioinformatics code on GitHub," 15 May 2018, BioRxiv pre-publication, https://doi.org/10.1101/321919, line 35.3. Ibid., line 27.4. Ibid., line 148.5. Thomas Krahn, "bigY_hg39_pipeline.sh," GitHubGist (https://gist.github.com/tkrahn/283462028c61cd213399ba7f6b773893).6. Russell, "A large-scale analysis of bioinformatics code on GitHub," line 84.7. Ibid., line 120.8. Ibid., line 208.
All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.
To cite this blog post:
Debbie Parker Wayne, "Whole Genome Sequence (Part 2) - Analysis Tools," Deb's Delvings, 15 May 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

Search This Blog

About Me

Debbie Parker Wayne, Certified Genealogist®

East Texas, USA

I am the owner of Wayne Research, a genealogical research service. Laws affecting family history and genetic genealogy (DNA) are areas of special interest to me. Many of my posts will be in those areas as well as topics of general interest to genealogists. Contact me through my Web site at debbiewayne.com.

Credentials

The words Certified Genealogist and letters CG are registered certification marks, and the designations CGL and Certified Genealogical Lecturer are service marks of the Board for Certification of Genealogists®, used under license by board certificants after periodic evaluation.

This blog stores no cookies other than those created by Google's blogspot.com service. By posting a comment, you agree that this website can store and handle your data; your name and other identifying information you enter in a published comment will be publicly displayed.