Subscribe To

Thursday, August 1, 2013

Sharing Data Files or What Happened to GEDCOM?

Let's suppose that after a lifetime of research, your Aunt Madge finally agrees to share all of her files with you. In a brief discussion, you know that she is using an antiquated computer running Windows 98 and you have had nightmares about the computer crashing and losing all of her data. But you are grateful that she has at least added all of her sources into her version of Personal Ancestral File (PAF).

After some long telephone conversations, you convince her to send you a GEDCOM file. The fact that the file arrives at your home on a series of floppy disks is a minor problem, but you get online and find a USB floppy disk drive and are finally able to copy the GEDCOM files to your own computer. Using the most current version of a popular genealogical database, you can easily import the entire GEDCOM file and magically, all of the 23,987 names appear in a new file you created to receive the information. As you begin to examine the file, you are mortified to see that all of your Aunt's sources are in the notes. The few sources that do come across, are all combined on one line and have a lot of strange characters. You begin to realize the huge task you now face to "clean up" the old PAF file.

In desperation, you look for further help online and discover that some of the current programs will read PAF files directly. Thinking this might be a solution, you purchase Ancestral Quest, the program claiming the most PAF compatibility. After another round of telephone calls to your Aunt, you obtain a copy of her PAF file and open it in Ancestral Quest. Unfortunately, the PAF file itself is too large to copy to floppy disks, so you have to buy your Aunt a hard drive and figure out how to get the information onto the drive, when her computer has no USB port. After a huge amount of technical difficulties, you obtain a copy of her PAF file.

All of the weird character problems are solved, but the sources in the notes are still in the notes. You wonder if there is a way to convert the sources in the notes into sources in your program. You would like to move the Ancestral Quest file over to your current genealogy program but when you try to do so, you find you have to export the file as a GEDCOM and then import the file into your own program. Putting the file through the GEDCOM conversion still adds all the weird characters.

Ultimately, you resign yourself to a perhaps years of cleaning up all of the data, not to mention the time it will take to check your Aunt's sources and move them all to the source fields in your own program.

This common example of problems in sharing data between computers with different operating systems and using different genealogy programs illustrates the issue of file sharing in the modern genealogical computer world. If you think your troubles are over when you finally "clean up" the file. You are just beginning your trouble. You have taken the time to add media, i.e. video, audio and images to your file. When you try to share your file with others, you find out that none of this media comes across exactly right. In fact, your biggest challenge is the fact that all of the media needs to be linked to the people in your file and when you move the data to another computer, it all has to be relinked because the program you are using lost the links in the transfer. You further realize that the amount of information you have attached to your file is huge and you have no practical way, other than purchasing a very large hard drive, to transfer all of your attached information to your family members.

Sharing files, especially very large files is a major challenge and the current status of GEDCOM and the almost non-existent file sharing capabilities of the most current versions of the genealogy programs do not make life any easier. There are some file sharing schemes out there in the form of utility programs that claim to transfer files between programs, but if your program and the program you are transferring the file to do not match the programs supported, you are still out of luck.

Here is what needs to happen for a file to be transferred from one program to another. I call this the "Fat Man's Pass" problem. The data fields in one program have to correspond exactly with the same field descriptions in the target program or the data goes into la-la land and is shaved off of the transfer. So data is lost in each transfer. GEDCOM is like the narrow Fat Man's Pass, only a certain amount of data will transfer properly from one program to another.

This is a real and very serious issue in the genealogical community. One reason I work with several popular programs at the same time on my computers is to make sure I know the difficulties and quirks of moving files from one program to another. One solution, of course, is to have everyone in your family use the same program. But that entails a whole new set of challenges and problems. For example, Aunt Madge's computer and operating system will not support any of the newer programs and she certainly does not have the storage capacity to include media with a file.

Some time ago, two separate organizations began an effort to address a few of these issues. With an almost universally recognized need to upgrade GEDCOM. There are currently three separate organizations working on the problem. That is the good news. The bad news is the apparent lack of progress in coming to a common consensus of how and what needs to be done to implement some workable way to share and transfer files.

Just for the record, without a lot of comment, the three organizations are as follows:

From time to time, these organizations surface at various conferences and discuss their mutual problems and goals, but very little progress has been made to resolve the perceived differences between the different proposals. If you watch carefully as new versions of the popular genealogy programs are introduced, you will notice a complete lack of reference to GEDCOM issues at all. What these organizations have mostly created is a huge amount of online posts and social networking discussions.

So what do most of the genealogists think about this major issue of file compatibility and file transfers? Until they are faced with the problem in a very personal way, most genealogists are not even aware that there is an issue.

Do I have a solution? Yes, but no one really wants to listen to specifications for image transfer and technical problems with text file formatting. Interesting in a world where stories and photos are taking such an important part in the world of genealogy. One small example will suffice, most online family tree programs allow uploading images. Nearly all of these images are in the lossy JPEG file format with some few exceptions. Are we creating a dead-end file sharing environment by allowing JPEGs to become the de facto standard file sharing format?

The first is that the genealogical data model described by a GEDCOM file is not adequate for many uses. It is acknowledged to be biased towards biological lineage and has rather limited concepts regarding personal relationships. If you want to represent generalised family history, or put greater emphasis on places, then it cannot cope. A better model may get resistance, though, from vendors who feel that it will require more investment, or from ones who feel that it gives users more freedom to move away from their own product. GEDCOM, as it stands, is a "throttled" exchange mechanism.

The second is the possibility that the existing GEDCOM standard could be "fixed" - meaning that the portability issues are addressed by getting vendors around a table to iron them out, and interpretational issues addressed by having a proper written and supported standard for it, both of these without unduly increasing the scope of the representation. I personally believe this would be popular with both vendors and end-users. GEDCOM isn't great but it's currently the best we have. Unfortunately, it quite doesn't work - as you rightly point out. There is no written standard, and the format was abandoned a long time ago. The obstacle to progress here would be the proprietary IP attached to the name.

It appears that the emphasis of GEDCOM X is the APIs rather than fixing a data export program. This brings up another point as to whether the local computer-based programs will survive competition with the online family tree/local programs.

A complex dilemma, to be sure. I've worked in developing formal technical standards for years and if the FHISO could take the lead and work to create an American National Standard that may help. Unfortunately, standards development is usually voluntary as it has to be done cooperatively and the final standards are public, which leads corporations to avoid them.

You're picking on the way PAF exports GEDCOM incorrectly. If FamilySearch hadn't stopped developing PAF so many years ago, they likely would have fixed those problems by now.

You are correct about your "Fat Man's Pass" problem. If one program doesn't have a field that another has, then there is no way the data can be loaded correctly. The solution to this in a new GEDCOM standard is to require that a program must retain any data it does not use in its original form. Then, on export, it can pass the data back out in standard format. This way, programs can work with just the data they understand, and no data is lost.

What we need is some middleware to sit between 2 non-standard databases to do a 2 way data transfer. This is done in large institutions (I know as I used to work for one) both live day to day and also as a migration exercise.

Possibly a better way would be for individual FH applications to share their proprietary database details and then someone could develop a suitable migration engine. This could be updated as necessary when new versions of the FH applications appear and could even cope with Gedcoms and broken PAF exports.

Thank you and your commentors for stating this problem so succinctly. I can stop repeating to myself "maybe it's just me but..."

So are there no programs that will import the links between facts, sources etc and media? Or does it also depend on the program that created the GEDCOM? Family Tree Maker for Mac 2 does not include media links at all. FTM2012 does, but they do not link to any particular event in the other software I have tried, so you have to relink them manually, as you described in your scenario. I had grumbled that they probably made it this way on purpose, so that it's too difficult for you to leave. After reading the comments I think I was closer to the truth than I realized. This really is another kind of genealogy brick wall.