In the 1980's, the Church
of
Jesus Christ of Latter-day Saints created
computer software to help individuals keep track of their family
history. They created PAF
or Personal Ancestry File. Many commercial companies have
also created commercial Genealogy programs to do the same thing. The
problem with having all these different Genealogy program is that
different programs store their data in different formats.
Genealogists love to share data. Having all these different file
formats is not conducive to sharing data. For this very reason,
the LDS church created the GEDCOM format. GEDCOM stands for Genealogy
data communication. It caught on quite well and most current
commercial and free genealogy programs today can import and export
GEDCOM files. This makes it relatively easy to exchange data
between genealogists. The GEDCOM specification has grown and
developed over time, because newer Genealogy programs are able to store
more and more information about people. GEDCOM must be as
flexible as possible to allow communication between the many different
types of family history programs out there. The current GEDCOM
specification even allows multimedia files so people can store videos
of there kids birthdays or sound clips of their grandparents
anniversaries, etc... The current version of production version
of GEDCOM is version 5.5. The next
version of GEDCOM is GEDCOM 6.0 or also called GEDCOM XML. It is
currently in Beta and the DTD can be found here.
There are several other XML vocabularies for Geneaology data that have
been proposed by other organizations. Here are several of them:

There are many markup languages which have been or are
being
developed for Genealogy. Here are a few of the markup languages:

GEDCOM
XML
- As mentioned above, this is also referred to as Gedcom 6.0. It
was prepared by the Family and Church History Department of The Church
of Jesus Christ of Latter-day Saints. GedML
-
Genealogical Data in XML
Encoding genealogical data sets in XML, it combines the
well-established GEDCOM data model with the XML standard for encoding
complex information. GeniML - Genealogical
Information
Markup Language
An XML vocabulary for recording and exchanging genealogical data. GenXML
A
file format for exchange of data between genealogy programs. It is an
alternative to Gedcom 5.5.

Since Gedcom XML is the next version of the Gedcom standard, I believe
it will be more poplular than the others. Therefore, I have
chosen to work with GEDCOM XML instead of the others.

To get GedCom data, I found
many GEDCOM files on the website: www.genealogy.com/famousfolks.
I also have been working on my own genealogy for several years
now and have several of my own files I can play with. Another
place to get GEDCOM files is www.familysearch.org.

The first number on each line shows nesting. '0' is the beginning
of a new record as in "0 @I12@ INDI" The characters between the
'@' symbols refer to the unique identifier for the individual.
The "INDI" show that this record is an individual. The second
line above starts with a 1. This means, that we are getting more
detailed about the given individual. A couple lines down, the
line begins with a '2'. Again, more details are given about the
line above, in this case, the date of the birth in the above
line. The tags "FAMS" and "FAMC" refer to the families that the
individual is a spouse in and a child in respectively.

The second section of the Gedcom file is a list of all the
relationships. An example is as follows

The tags that are important for my purposes are <FamilyRec>,
<IndividualRec>, <EventRec>, and <GroupRec>.
FamilyRec is for families and of course IndividualRec is for
indivduals. The equivalent tags in GEDCOM 5.5 for these tags are FAM
and INDI. EventRec stands for events such as births, deaths,
marriages, etc... GEDCOM 5.5 does not have an event tag. It
does however have tags for specific events such as birth (BIRT),
marriage (MARR), death (DEAT), etc... The GroupRec can store
information about a group such as a household, a neighborhood, an
orphanage, a group of homes, etc... GEDCOM 5.5 does not appear to
have such a tag.

<!--
This is the ID used by the system that produced this GEDCOM file.
It can be used to communicate
changes, differing opinions, and so on, to the file submitter. -->
<Submitter>.
. .</Submitter>
<Note>. .
.</Note>
<Evidence>
<Citation>
<!--
Normally a family is based on
(see above) events, and the evidence citations are contained in the
events.
Evidence is allowed in family
records for those cases where a family is documented,
such as in a family history, but
no specific events are known.
-->
</Citation>
</Evidence>
<Enrichment>
<Citation>
<Link Target="SourceRec" Ref="SR002"/>
<Caption>We Attend the Kunzle Family
Reunion</Caption>
<WhereInSource>
5 min, 15 sec into the video, to
10 min, 30 sec.
</WhereInSource>
<Note>Our family is featured about 5 minutes
into the video.</Note>
</Citation>
</Enrichment>
<Changed
Date="23 APR 1976" Time="13:25:12">
<Note>Record created</Note>
</Changed>
<Changed
Date=". . ." Time=". . .">
<!-- The
Contact here is the person responsible for the change. -->
<Contact>
<Link Target="ContactRec" Ref=". . ."/>
</Contact>
<Note>Adopted child added</Note>
</Changed></FamilyRec>

I wrote a java program
to
transform data in the GEDCOM 5.5 format to basic XML. I also
wrote many unit tests for the program to make sure it was working the
way I expected it too. Too run the java program the type the
following command in the folder which contains the jar file:

java -jar
GedComConverter.jar
inputFile.ged outputFile.xml

where "input.ged" is the input file and "outputFile.xml" is the name of
the new file you wish to create.

I ran some statistics on my java
code using Maven. Of
particular interest are the Unit Tests reports, the JCoverage reports,
the Java Docs, and the source and test xref(cross reference).
They are all in the Project Reports Menu.

RDF has several advantages over other formats.
First, it is
a
relatively flat and simple structure. Because of this, it should
be much easier to write a stylesheet that converts the RDF to another
XML file format than it would be to write a stylesheet that converts
from one XML format to another. One may need to search deep in
the tree of an XML document, but RDF doesn't go that deep.

Another nice advantage is that it would be fairly
easy to combine
two
different vocabularies for the same domain. DAML has tags such
as: daml:intersectionOf, daml:unionOf, daml:complementOf
daml:inverseOf, daml:equivalentTo,
daml:sameClassAs, daml:sameIndividualAs, etc... With these tags,
one could easily combine two different vocabularies and define that an
Individual in one vocabulary is the same as a Person in another
vocabulary.

RDF was created to be based on semantics, so it
should be easier
to make Inferences, especially with the help of an RDF processor.

Nice Graphs can be made to graph the relations
between objects
such as this
graph.

The elements of RDF are uniquely defined so they
could be
uniquely specified.

There's nothing to stop people from developing
multiple RDF
schema's for any given domain. Scouring the web a little bit, I
found four for Genealogy. XML or any other file format suffers
from the same problem.

RDF is relatively new and tools for RDF are
relatively scarce
compared to other more seasoned technologies.

Anyone who wants to can write RDF including RDF which
has false
data or destructive data. Someone could write RDF for which a
person has a son who is also his father. This recursion is not
only impossible in real life and is thus bad data, but it could also
potentially break an RDF processor. This problem is not a new
problem, nor is it unique to RDF. The same could be said of any
format. When receiving data from other sources, it should be
varified before putting a lot of faith into it.