Is GEDCOM Dead? Date/Place of Death, Please?

by C. Michael Eliasz-Solomon

The RootsTech Conference is living up to its name. Everywhere there was a sea of: iPhones/Androids, iPads (in huge numbers), and laptops. Even the very elderly were geared up. Google, Dell, and Microsoft were at RootsTech. — why not Apple, especially since their customers were present in LARGE numbers??? [note to Tim Cook have Apple sponsor and show up as a vendor.]

According to Ryan Heaton (FamilySearch), “GEDCOM is stale.” He went on to speak about GEDCOMX as the next standard as if GEDCOM were old and/or dead. They were not even going to make GEDCOMX backwards compatible! In a future session I had with Heaton I asked the Million dollar question, “How do I get my GEDCOM into GEDCOMX”? After a moments pause he said they’d write some sort of tool to import or convert the existing GEDCOM files. Well that was reassuring??? So they want GEDCOMX to be a standard but FamilySearch are the only ones working on it and they have not had the ability to reach out to the software vendors yet (I know I asked).

My suggestion was to publish the language (like HTML, SQL, or GEDCOM). I asked for “railroad tracks“, what we used to call finite state automata, and what Oracle uses to demonstrate SQL syntax, statements that are valid with options denoted and even APIs for embedding SQL into other programming languages. Easy to write a parser or something akin to a validator (like W3C has for HTML).

Dallan Quass took a better tack on GEDCOM. His approach was more evolutionary, rather than revolutionary. He collected some 7,000+ gedcoms

GEDCOM Tags

and wrote an open source parser for the current GEDCOM standard (v5.5). He analyzed the flaws in the current standard and saw unused tags, tags like ALIA
that were always used wrong, custom tags and errors in applying the standard. He also pointed out that the concept of a NAME is not fully defined in the standard and so is left to developers (i.e. vendors) to implement as they want. These were the issues making gedcoms incompatible between vendors. He said his open source parser could achieve 94% round trip from one vendor to another vendor.

Now that made the GEDCOMX guys take notice — here was their possible import/conversion tool.

The users just want true portability of their own gedcoms and the ability to not have to re-enter pics, audio, movies over and over again. RootsTech’s vision of APIs that would allow the use of “authorities” to conform names, places, and sources would also help move genealogy to the utopian future Jay Verkler spoke of at the keynote. APIs would also provide bridges into the GEDCOM for chart/output tools, utilities(merge trees), Web 2.0 sharing across websites / search engines / databases (more utopian vision).

GEDCOM is the obvious path forward. Why not improve what is mostly working and focus on the end users and their needs?

FamilySearch get vendors involved and for God’s sake get Dallan Quass involved. Publish a new GEDCOM spec with RailRoad tracks (aka Graphic Syntax Diagrams) and then educate vendors and Users on the new gedcom/gedcomx. Create a new gedcom validator and let users run their current gedcoms against it to produce new gedcoms (which should be backward compatible with old gedcom to get at least 94% compliance that Quass can already do)!

Ask users for new “segments” in the railroad tracks to get new features that real users and possibly vendors want in future gedcoms. Let there be an annual RootsTech keynote where all attendees can vote via the RootsTech app on the proposed new gedcom enhancements.

How about that FamilySearch? Is that doable? What do you my readers think? Email me (or comment below).

P.S. Do Not use UML models to communicate the standard. It is simply not accessible to genealogists. Trust me I am a Data Architect.

5 Comments to “Is GEDCOM Dead? Date/Place of Death, Please?”

However, I am going to put my Genealogy Trust in YOU to look out for the rest of us, and give us the understandable condensed version when someone gets it all figured out! If I had attended the conference, I would have been the one holding the 3 year old Virgin-Pay as you Go phone, that has a label on the backside with my husband’s cell phone number and the phone number of the phone handwritten on it! Some of us are getting left in the Dust of technology……. I Thank God to see the old church records written with Ink on real Paper!

MaryAnne,
Forgive me and my computer jargon. RootsTech is at its heart a technology conference. Genealogy technology for sure. I have a 30 year career in Technology.

The jargon was necessary. GEDCOM is the file format that stores our family trees. The files are plain text files filled with ‘@’ symbols, numbers and our genealogy facts and family relationships. There are also four character tags to indicate what info is on each line.

It is an old idea that has not progressed. So software vendors had to invent their own four character tags in order to add features to their software. However, the non standard aspect means we lose data if we switch family tree software. Usually we lose notes/sources and pictures, etc.

At last years RootsTech they introduced the idea of tackling a standard but they have not made much progress and it is not open since only FamilySearch is making design choices without any outside input. Now they want input. They are using advanced technology and it clouds their design. I want “Railroad Tracks” which a graphic picture used to show which “words” (tags) are allowed in which order to form standard sentences in a language (gedcom/x) that can be unambiguously understood by all.

That is the heart if my argument. Plus I want outside buy-in by the software vendors and the users including future changes. I also wanted to make sure that all prior genealogy research is preserved(not lost).

It is my opinion that this is the bottleneck that prevents the free flow of all current research from being shared/utilized via new tools to tackle the hard problem of merging trees and smartly matching people in other trees. It is why APIs cannot be developed to let all software interface and allow for future special add-ons like the vision pitched at the conference.

I hope that except for APIs, which is an acronym for Application Program Interface. This is a technique to allow programs to share info that would otherwise not be able to communicate.

Mr Heaton,
Welcome to my blog. You presented your position effectively at RootsTech and as I said I went to two of your presentations (both on GEDCOMX)– both were high quality.

I was mulling your Github suggestion around since I went to RootsTech. I will make an attempt ( I hope it will not be wasted ). My concern arises because I think the direction your team is proceeding with is NOT towards an OPEN standard.

Here is my rationale …

If your team’s goal is an open standard, then what are you standardizing? I had thought it was GEDCOM (or its more advanced cousin GEDCOMX ). Now I think GEDCOMX is a computer language (like HTML or JAVA for example). Therefore in my opinion, you need to define the language.

A language is defined by a grammar. Grammars are represented via BNF automata (FSA’s) — so it parses quickly. The grammar should be detailed enough that people do not interpret important facets. Because interpretation leads to non-standardization.

Let me add that nothing your team is doing precludes that. But there is NO need for a model (UML or relational) to explain the language. Hence my suggestion for the “railroad tracks”/Graphic Syntax Diagram — which graphically depicts a language. I propose that as the means to communicate GEDCOM or GEDCOMX.

A data model (or object model in your UML case) is just an implementation detail. A detail of what? It is the detail of how you (FamilySearch) intend to store the data from language stored in the GEDCOM/GEDCOMX file. Once you have data in your model you will use a program to manipulate the data (like to display & manage a family tree) or to output reports/charts or to perform smart matching or to conform Names/Events/Places/Citations or anything else someone might dream up.

Also once the data is in a model then interoperability via APIs is now possible. That interoperability could be program-to-program (back-endian) or website-to-website or webapp-to-webapp or service-to-service or any other medium of sharing/interoperating. This is the vision I believe that Jay Verkler was articulating.

Let everyone implement their own model and use GEDCOM or GEDCOMX as their standard language and APIs as the standard for interchange. I do not think you will ever be able to get anyone to standardize on an object (what object oriented language’s object definition) much less the object operations, much less the set of all objects necessary to cover the GEDCOM. You should also be aware that there is no way to completely map from GEDCOM or GEDCOMX onto a Model (object or relational). You will be facing Godel’s Incompleteness Theorem if you try.

So my concern is that your team is focused on an implementation detail (with an eye to your own software) when it should be concerned with establishing the Language Standard of Genealogy (GEDCOM or GEDCOMX) including the Standard API (for interchange) and getting enough broad consensus by all parties that we can achieve at least Dallan Quass’ 94% and then moving the 94% compatibility towards 99.9999%. Also openness means we need some way to gain consensus on future enhancements to the language as well — hence my idea to vote at RootsTech every year at a keynote presentation via the RootsTech app on GEDCOM/GEDCOMX enhancements.

Let your model be the implementation of GEDCOM/GEDCOMX and the Standard API be how programs/apps/services/etc. interoperate.

So you should be standardizing GEDCOM/GEDCOMX and its API (not a model).