Pages

▼

Wednesday, February 1, 2012

How many records? A dilemma

Online digitized record repositories' claims regarding their
collections are hopelessly muddled and create a dilemma for genealogists
trying to make a comparison between the online websites. At the root of
the problem is the total lack of consistency about such terms as
record, file, document, names, individuals, collections, and many other
similar terms. Unfortunately, users of the various sites sometimes judge
the relative usefulness of the information based on the way the size of
the database is expressed. This is especially true of family tree or
user contributed sites. Large is equated with good and useful even if
this is not necessarily so.

Let's look at one site for an introductory example.

WeRelate.org
is an extremely useful and focused site for displaying individual and
family information. The wiki format allows for intensive sourcing and
inclusion of media. I highly recommend the site. Now, what about their
claims regarding records? Here is a statement from the WeRelate.org
startup page, where WeRelate claims to be "the world's largest genealogy
wiki with pages for over 2,153,200 people and growing. This is quite an
impressive number until you look at the WeRelate.org Special:Statistics
page of the wiki. Here is the quote from the page:

There are 6,111,311 total pages in the database.
This includes "talk" pages, pages about WeRelate, minimal "stub"
pages, redirects, and others that probably don't qualify as content pages.
Excluding those, there are 2,876 pages that are probably legitimate
content pages. (emphasis in the original).

What is a "legitimate content page" and how does that differ from the
claim to "over 2,153,200 people?" No where is the seeming discrepancy
explained. What value are the "people pages" if they contain no content?

This issue of content is not at all unique to WeRelate.org (and I
am not picking on that website at all, merely using it as an example).
Take for example a comparison between two huge online genealogy giants; FamilySearch.org and Ancestry.com.
A superficial look at the two sites would have you believing these two
claims: FamilySearch.org claims 1033 "collections" and Ancestry.com
claims 30,554 collections. Are the two claims accurate and if they are,
do they reflect the relative size differences between the two databases?

It turns out that the term "collection" as used in the two
databases are substantially different in their application to the
records contained in the databases. Both websites use the term in a
totally ambiguous way that gives the user little information about the
amount of information on the website.

FamilySearch.org uses the term "collection" in a loose way to
designate geographically related records created in a certain way. The
term collection is used to refer to original source records as well as
extracted records and indexes. So in one instance, the 1855 Alabama
State Census is said to contain 34,978 records but in this case, this
collection is an index, so it contains names not records. It is
certainly not clear what is meant by the term record when each index
entry is counted as a record. In another example, the 1869 Argentina
National Census is said to contain 1.799,773 records on 157,426 images.
Apparently, the number of records refers to the entries on the Census
records. But in another collection, such as the Argentina, Salta,
Catholic Church Records, 1634 - 1972 there is no number for the records,
just a reference 144,293 images. So the total number of "collections"
is arbitrary and meaningless. If you drill down into the records,
especially those that have images only, you will find that some
individual collections are comprised of dozens of rolls of microfilm.

Again, I am not criticizing FamilySearch or anyone else, merely
commenting on the vague and ambiguous nature of the designations. Why
give a number if the number is meaningless?

Ancestry.com has the same issues as FamilySearch.org but in most
cases it is harder to penetrate the confusion. Collections in
Ancestry.com are listed with a number of "records." But the number of
records is not further defined as pages or individuals or whatever. One
number sticks out, the number of records in Member Family Trees is
claimed to be 1,838,295,985. Hmm. That is a really big number. How many
unique individuals are represented by that number? For example, if I
search for one of my ancestors in the Public Member Family Trees, take
Henry Tanner for example, I find 57,210 instances of his name.
Speculating, if I divide the total number of records claimed by
Ancestry.com by the number of duplicate records for Henry Tanner, I get
about 32 million entries, still a large number but what is the real
number? How many duplicates are there? Isn't this the same problem I
started out with on WeRelate.org? Only Ancestry.com does not bother to
tell us how many records have content?

The number of
records claimed by both FamilySearch.org and Ancestry.com do not give us
any idea of how many duplicate records there are for an individual. For
example, my ancestor might appear in multiple family trees, but he may
also appear in multiple records, all with exactly the same information
such as a death certificate and an index of deaths.

The confusion in the terms "record" and "document" is even more
dramatic. Fold3.com is an example of using all terms interchangeably.
For example, Fold3.com has a list of "collections," claims to have
86,022,535 images, and 100,232,144 memorial pages. Fold3.com collections
include American Milestone Documents and Matthew Brady Photographs
among other collections. How do we compare the numbers to either
Ancestry.com or FamilySearch.org? The simple answer is we can't.
Numbers don't lie, but they don't say much either.

Rather than take these numbers, no matter where they originate, with a grain of salt, perhaps we need a whole salt shaker.