Subscribe To

Wednesday, June 11, 2014

How will pushing data affect genealogy and genealogists?

There is a rapidly exploding trend in online genealogy to “push” data to genealogist by means of online database and family tree programs. Personally, I am a active proponent of such “pushed data,” essentially for the reason that it automates and accelerates “routine” genealogical research. But after nearly 40 years of trial experience as an attorney, I am well practiced in examining all sides of any issue, particularly if that issue tends to be controversial. For this reason alone, I have analyzed some of the negative consequences of “pushed data.”

One side note. Controversy in genealogy is about as mild and inoffensive as anything that might be called controversy. Sometimes I hit on an issue that raises the average hits on my blog somewhat, but I use the word "controversy" advisably. The reason I use the word here in reference to pushing data is that there is substantial confusion among the genealogists I talk to concerning both the need for such a system and the effects that the proffered "sources" have on researchers.

Before I go much further, I need to answer the question of what is pushed data?

Many genealogists use online database programs to research collections of digitized source documents and indexes of source documents. There are likely thousands of such websites scattered around the world, from small collections of a few specialized documents to web based mega-data providers that have millions and even billions of searchable records. These websites are both fee based and free depending on the provider's motivation. As long as those records remained passively supplied, researching the records was an almost exact analog of researching traditional paper-based or microfilmed records only much more convenient. However, at some point the purveyors of these online collections began proactively pushing their records to the users. This was first done in the realm of connecting different nodes on online family trees. For example, if two users shared the same remote ancestor, the programs began telling the two users of the potential connection.

Suggested family tree connections rapidly evolved into the databases suggesting potential data sources. I have noted in other blog posts that the technology and programming behind these systems of automated source suggestion are extremely sophisticated and complex. It may seem trivial to match your great-grandfather or mother to a corresponding record in a database, particularly if that database has been transcribed and indexed, but in fact, this is an almost monumental task. The reason for this difficulty should be clear to any genealogist who has spent a significant amount of time researching original records. It is one thing to find the record, it is quite another thing to read and interpret the record accurately. Original records are sometimes vague, indistinct, inaccurate and illegible. Writing computer programs that can compensate for these limitations and perform consistently with positive results, is a monumental challenge.

Notwithstanding this difficulty, many of the online database companies have begun the process of matching user’s ancestors to source records, either through automated programs or with user intervention. Some of the online programs do a phenomenally good job of finding sources. This source-finding ability appears to be a natural outgrowth of the user oriented search engine technology used by the websites to help users find indexed records. Actually, it is quite a different technology. It is not just a better search engine, it is revolutionary way of looking at genealogical records and matching those records to the right person through a set of in depth algorithms that consider entire pedigree segments rather than the name, date and place of some individual. Some of the previously available "Advanced Search" capabilities previewed this type of technology, where the user could add the names of some selected relatives to assist in identifying the target ancestor.

In the competitive online world of commercial genealogy programs, why not develop technology that that will not only differentiate similar products but also become a value added attractor to new users? One challenge for the companies developing this new technology is the difficulty in communicating the benefits of the matching technology to naive users. This difficulty is dramatically illustrated by how few of those who put their family trees online are further motivated to take advantage of the automatic or semi-automatic source and tree matching functions. As I stated, I believe there is a substantial benefit to the genealogical community, but those benefits do not seem to be obvious to most of the people who have family trees online.

So what are the drawbacks of pushed sources? I would suggest that the first and most serious drawback is that the typical user has no idea what to do with the suggested source or why such a source might even be useful or necessary. So far, none of the online programs suggesting sources have provided either motivation or support for the process. Now, what I mean by this lack of support is not that the process of adding sources is not adequately explained, but the rationale for adding sources is all but ignored. It is not the mechanics of adding the sources that is the challenge. The real challenge is convincing those who post their family trees online that sources supporting the information in the family tree are necessary. To a seasoned genealogical researcher, the idea that any fact or event recorded in a family tree should be supported by a source citation is elementary. But in the real world of online family trees, this is far from reality. I am purposely avoiding using any particular online program as an example, but it is clear from an examination of even a few user submitted online family trees that sources citations are not a high priority.

Even if a family tree submitter is convinced, for whatever reason, that sources are important, the ease of obtaining those sources is a trap. If I were to upload my family to one of these websites that push sources and immediately got a suggested U.S. Census record, how would I know what to do with the record. Does the average user of a family tree program even know what a census records is? Why would I think that even more sources might be helpful or necessary? In other words, these programs are providing the end product of researching a record without having a researcher to evaluate the information contained in the record and integrate that information into a coherent ancestral record. So I have a source, so what? That is the question that needs to be answered. If the users of these programs do not see the need for sources in the first instance, why would they spend the time to look at sources and examine them critically? How would they know that that they needed to look elsewhere for additional information?

My concern is that by offering what are essentially "fast food" sources provided with little or no effort on the part of the user, the programs are becoming a disincentive to further useful research. If you got dessert all day, why would you want meat and potatoes?

From another standpoint there is also an issue with the sources themselves. Even granting the online websites great technical skill and accuracy in connecting the source to the right person, the issue becomes the source. What if the information in the source is inaccurate or incomplete? How is the naive user supposed to know what to do with this misleading source?

I think that adding automatic source matching to the online programs makes them immensely more useful to seasoned, well-founded, genealogical researchers. But there is one last issue. The experienced genealogists are also unimpressed with the matching technology because they don't know how to use it. They are overwhelmed with the number of sources offered and resent the programs for detracting them from their research goals.

It looks like to me that the matching programs have a long way to go before they are generally beneficial to the genealogical community. As the number of online sources aggregated to these programs increases, they become that much more valuable to the prepared researchers, but at the same time, they become a stumbling block to those who are not well founded in research principles.

By the way, from my contacts with the large genealogy companies, I am reasonably aware that those developing these technologies are aware of the problems and challenges and are working to overcome those same issues. I am very positive about the future and see this technology becoming a tremendously time-saving tool.