The unofficial, unauthorized view of Ancestry.com and FamilySearch.org. The Ancestry Insider reports on, defends, and constructively criticizes these two websites and associated topics. The author attempts to fairly and evenly support both.

Friday, July 27, 2012

Ancestry.com Laps FamilySearch in Indexing Horse Race

Amidst growing reports that the Ancestry.com index has large numbers of errors, Ancestry’s release of 12 additional states on Thursday vaulted their position in the horse race to twice that of FamilySearch (as of 11:00 AM, Thursday). Ancestry has published indexes for about 70% of the 1940 U.S. Census compared to FamilySearch’s 35%.

Since my last update, Ancestry has published indexes for these states: Alaska, Arkansas, Idaho, Massachusetts, Minnesota, Missouri, New Mexico, North Dakota, Oklahoma, Rhode Island, South Dakota, and Utah. During the same time period, FamilySearch did not publish any, but did finish indexing Connecticut, Illinois, Kentucky, New York, Pennsylvania, Texas, West Virginia, and Wisconsin.

In terms of number of states, Ancestry has only published six more states than FamilySearch (38 to 32, respectively). The size of their lead is a result of publishing bigger states than FamilySearch. Of the ten biggest states, Ancestry has published seven. FamilySearch has finished indexing six of the top ten states but has only published one.

How quickly it can clear its backlog will decide the winner of this horse race.

Quality

Meanwhile, the question of quality looms large in users’ minds. Many users are reporting problems in the Ancestry index. I contacted Ancestry for comment and got answers to some of your questions.

“We are confident that our index, delivered in record time and optimized as it is to work with our proprietary system, provides the best and most powerful 1940 experience on the market,” said Todd Jensen. Jensen is senior director of document preservation services at Ancestry.com.

Several of you asked where Ancestry’s keying vendors are located. “We used four vendors to key the 1940 Census,” said Jensen. “Two were located in China and have been involved in Family History record transcription for many years. Another was located in Bangladesh and the fourth in the Philippines.”

While Ancestry doesn’t share details about their quality and audit methods, Jensen calls them “rigorous” and explained the process generally. If the quality tolerance is not met for a batch, it is sent back to the vendor for rework followed by another, separately sampled audit. “We can say that throughout this process we have taken every effort to ensure accuracy by holding our keying partners to high quality thresholds and implementing new and advanced quality assurance processes.”

Ancestry’s search system takes indexing errors into account. “Once batches are passed,” said Jensen, “there is extensive post production work which occurs. Index data is further augmented to maximize its chances of being ‘found’ in a search or through hints. Even names which have difficult handwriting have a chance of being found with our proprietary systems.”

Jensen acknowledged the comparisons being made between Ancestry’s and FamilySearch’s indexes. As reader AnnieB has pointed out, Randy Seaver has done one such comparison. I plan to do my own as soon as time permits. “Whilst we don’t discount such reviews,” said Jensen, “evaluation of indexes of this size is problematic with even large samples being statistically unrepresentative of overall quality.”

Jensen remembers statistics a little differently than I do. Large samples can be quite representative of overall quality. The problem with most reviews—including the one I will do—is that the samples are not random. That, not the size of the 1940 Census, makes it unwise to generalize results to the entire index.

Still, such reviews, as well as your individual experiences, have meaning and value in their own sphere. Leave a comment and tell us what you’ve found. Is anyone having positive experiences with Ancestry’s index? Or dare I ask, negative ones with FamilySearch?

As I have been doing for years, I go to the "Search Census and Voter List Records" screen, select 1940, that screen pops up and I use the drop down list to select state. I do not find any of the new states listed.

Not sure what happened to my last post..probably forgot to click on Publish. Anyway, problem solved. It was my browser cache. And, after getting that solved, the first family I searched for and found were transcribed incorrectly. The given names of three of four family members were incorrect. The surname was transcribed correctly.

I'm not buying Ancestry's claims. I love Ancestry, have been a paid member for more than 2 years. No axes to grind here. But I've been doing some searching and looking over the indexes at more names than just the ones I'm searching, and I'm seeing a lot of abominable misspelling, even when the names are clearly written and easy to read. It's really obvious that the people who did their indexing werent familiar with American names and places.

I did a lot of 1940 indexing for Familysearch; I'm tremendously impressed with their system and accuracy rate. I would have gladly indexed for free for Ancestry if given the opportunity. Dont know what they paid the foreign companies, but with all the people in this country that would love to earn some money at home, it's a shame they didnt get a chance to do the work.

Anna, you're so right. Ancestry is a U.S. based company with revenue of over 300 million dollars. One would think that they would keep the money in their own back yard by employing some of the many out of work Americans. But sadly, it comes down to the almighty dollar as always. It's clear to me that Ancestry cared not so much about putting out an accurate product but instead one that cost them the least to produce.

Ancestry claims that the company in China has years of experience with transcription but when it comes to the actual individuals doing the keying, I have serious doubts that English is their first language. The same with the other countries Ancestry chose to employ. I suppose you don't have to be fluent in a language in order to transcribe it, but then, we are seeing the results of that in this mess of an index Ancestry is putting out. I have transcribed for many years and never work on a project that is not in a language I am fluent in. It appears that in their "race" to get the 40 Census up and running they put accuracy on the back burner, probably thinking they'll let researchers make the corrections for them over the coming years. Shame on them.

I feel sort of slapped in the face. I have been an Ancestry subscriber for many years. I am sort of thinking this might be my last year, as much as I have loved Ancestry. Why would any American genealogy company not give the Americans an opportunity to get paid to do their indexing? I am pretty upset by this.

Ancestry didn't need to pay anyone to do the subscription. They just needed to play ball with FamilySearch and the other companies that got together to do the 1940 Census. Instead, as usual, they did their own thing, probably put millions into the effort, now have an inferior product, both offerings (from Ancestry and others) were finished more slowly, and for their investment, Ancestry now gets to compete with a freely available 1940 Census. Their desire to be a monopoly seems to have deprived them of business sense.

For that matter, why can't they provide an open API to their product so that it's possible to purchase a Genealogy product that actually works both with Ancestry and FamilySearch? Or at a minimum, leverage FamilySearch's API in their own products like FTM? As long as they fail to do so, they insure there's always going to be a market out their for products that compete with their own. It hurts the genealogy market as a whole, and it hurts their bottom line.

I appreciate that you were able to get some specific answers from Mr. Jensen. These same questions have been asked time and again on the ACOM message boards and have either been ignored by Staff or Staff has posted evasive answers.

Insider, thanks for asking the right questions of Ancestry, but their answers are not particularly enlightening. Mr. Jensen would have to say something like he did, i.e. their system "provides the best and most powerful 1940 experience on the market". What else could he do? What I would like to see is his response, if he would be willing to give it (which I suspect he won't), to the direct evidence of the poor job they have done in indexing. The direct evidence I refer to is the detailed report posted by Randy Seaver and the several smaller reports reported by me and others on your blog.

If, as he says, they "ensure accuracy by holding our keying partners to high quality thresholds and implementing new and advanced quality assurance processes" then how did the low quality results being reported get past such high quality thresholds? And I had to laugh to myself when I read his statements, "Index data is further augmented to maximize its chances of being ‘found’ in a search or through hints. Even names which have difficult handwriting have a chance of being found with our proprietary systems." Their proprietary system couldn't find "Francis" when it was spelled "Frances". I agree that this is just one incident, but it does suggest that their search engine has at least some significant flaws.

I'm going to make some comparisons along the lines suggested by Randy Seaver and I'm glad to see that you are going to do a similar comparison yourself. I would be interested to see such a comparison made by someone at Ancestry. If any such comparison, using a significant sample size, finds Ancestry has the better index I'll be surprised.

OK, since Ancestry has published Utah, I did a search for my parents and found them easily. The page they were on was indexed flawlessly, a first in my experience with Ancestry so far. There were only 11 lines of data on that page, but they got them perfectly. Also, I love the way Ancestry displays the data with the page image above and the indexed information below. The way their system highlights the data is very helpful also. I figure since I have been critical of their indexing of the 1940 census when I found it wanting, I should give praise when they get it right. I hope to find more positive experiences with the Ancestry version of the 1940 census, but so far this has been the only one.

1940 Census Index Report for—July 25, 2012July 25, 2012 By dgreen 8 Comments FamilySearch is excited to announce that we have indexed almost 90% of the entire collection with 31 states fully indexed and available for searching at FamilySearch.org. Full Story

The discrepancy is between indexed and published. Notice on their map that some states are orange. These are published and available for searching. Other states are dark brown, indexed 100%. When the color changes to orange, then these states are searchable.

Incidentally, as of the date of this report, FamilySearch's indexing completion was 95%.

It's nice to know my membership fees are supporting the wages of those in another country who will spend it helping their economy. No wonder our country continues to see unemployment go up and fewer jobs for people to pick from. The medical transcriptionists are seeing the same happen with their jobs. Hospitals previously used American transcriptionists. However, all of that work is being sent overseas since today's high speed internet makes it possible.

It makes you wonder about the quality of our chart notes from visits to our hospitals as well. Those records are mostly transcribed by foreigners now too. :(

If someone wants to have fun with Ancestry and their wonderful outsourced index, try this link on for size: http://search.ancestry.com/Browse/view.aspx?dbid=1222&iid=kyvr_7007126-0082

If you open the index at the bottom of the page and compare the names on the page, that while not totally clear for the most part are fairly readable, with the names being indexed, it really makes you wonder why did they even bother?

It's not just this page either. It's the vast majority of the pages between the years 1852 to somewhere in the 1890's for this data collection.

While I had been a long time user of Ancestry when I ran across this it really made me question my subscription to them so I cancelled it.

It's a shame that again the all mighty dollar has gotten in the way of quality.

I have browsed EDs page by page at FamilySearch, Ancestry and the NARA site, and I've searched for folks as indexes have come online at Ancestry and FamilySearch. Until yesterday I have submitted few corrections. The corrections I submitted to Ancestry yesterday were head-scratchers. I suppose that the transcriptions having been farmed out overseas makes sense, because I'd have a hard time believing someone in the US could have made some of the mistakes. From what I've seen, the 1940 census is by far the easiest to read.

As far as this race to completion that either exists between the companies involved, or only in the minds of some bloggers, give it up.

I can hardly wait to see the 1940 census indexes for New Jersey! I have indexed many batches of the 1940 NJ census on FamilySearch. The images have been blurry, but (since I lived in NJ all my life) I can usually make sense out of the names and places. Yesterday I worked on some batches for Carteret in Middlesex County. The images were terrible, the enumerator didn't follow the instructions, the handwriting was awful, and all the family names were Polish. I can't even guess what the Chinese would do with those pages.

Sharon, I, too, am anxiously awaiting the NJ pages. I found my core NJ family members with relative ease, before indexing, because they had all lived in the same 4-block radius for a century. A few cousins moved a little farther away, so I'm waiting for an index to help find them. Fingers crossed.

Well, that explains a lot. I have done survey work in South America with my US students and our international partner schools. My US students helped with the data entry (surveys were in Spanish; US students all anglophones who speak spanish as a second/third language). The problems of transcribing were immense. Even handwriting is different. I can imagine what problems Ancestry's transcribers have had.

I'm sure that Ancestry's transcribers were paid for speed, too. That won't help matters. As a volunteer for Family Search, I sometimes took a half an hour to get a name right. I doubt Ancestry's indexers had that luxury.

In short, I think Mr. Jensen's comments were, at best, "spin". I've been a paid subscriber to Ancestry for many years and I've had some really great breakthroughs as a result of the material they have published. I just wonder what else I might be missing?

Oh, I'm not saying DBs have all the answers. I just wonder what I might be missing on Ancestry's DBs, given their indexing problems. I have no problem roadtripping to a town hall or archive if I need to find a source that has not been filmed, scanned, or otherwise reproduced.

Spent several hours today doing some initial 1940 census searches on Ancestry - several different names in MA and ME, different towns. The indexing was atrocious - "Rabert" for "Robert", "Prisvilla" for "Priscilla", and "Anthur" for "Arthur"...and these were the ones that were clearly written (one of them was actually PRINTED instead of illegible handwriting). It took me way longer than it should have to find these folks.

Why do all the 1940 records download with 1 Apr 1940 date on Ancestry ? I always have to look at the record to get the proper date it was written on. The indexing for the date can not be that hard. Did they forget that dates do matter when finding a time line!

The 1940 census was taken AS OF 1 Apr 1940. If done correctly, a sheet dated April 10th WOULD enumerate a person who had died on April 5th because he lived there on the 1st. Conversely, a person born on the 5th would NOT be enumerated on the 10th.

I would have thought, for both Ancestry and FamilySearch, accuracy would have been by far the most important attribute, but it's not.

I don't care if the indexed census comes available this week or next month, like everyone who will actually use this data, I want it correct.

I did some FamilySearch indexing, and it was eye-opening. I was extremely careful to get my data correct. I had a 99% rating, which I would have thought should have been awful. But I don't blame myself, I blame their process. On particularly difficult page, I was struggling, so a opened another window and pulled up the 1930 census of the same place. It was completely clear to read, and except for a couple of new kids, the families were all still in the same place. Given that, I knew my transcription was perfect. My transcription, of course, was completely rejected. I challenged it, explaining my method. No go. I was disheartened by this, as I realize this same lack of detail must be happening all across the indexing effort.

I have since checked the ancestry.com's trascription of that same page: it is even worse.

Given I am doing almost all searches for ethnic eastern european names, where spelling was questionable even when you could read it, I never expected more than about a 50% hit rate on census searches. I'd estimate that the 1930 census (ancestry or FS) is about that, 1900 and 1920 slightly better, 1910 slightly worse. I haven't done much with the 1940 census yet, but so far I would estimate a very low hit rate.

I also find the off-shoring of the transcription of American census records, very sad and ironic.

I've found errors with FamilySearch's index dealing with some of my ancestors neighbors. But they were logical errors. I think that the Familysearch arbitrators should have a go and comparing the indexes as was done with another census in order to get an even higher quality index for both companies!

I have a fairly large tree on ancestry.com, about 4000 people, and am slowly adding the 1940 census information. I have found the quality of indexing for ancestry.com to poor compared to familysearch.org, and the search algorithm for ancestry.com is also poor in comparison. No indexing is perfect, but familysearch.org has a much better QC system. Of the approximately 300 family units I have searched and found 1940 census records for, about 1 in 7 could not be found using ancestry.com, but were easily found using familysearch.org. Indexing is not easy, and often it can be difficult to discern between letters like "a" and "o". I have found numerous cases where the ancestry.com indexer had a choice to make where one yielded a common surname, and the other did not, but the indexer picked the letter that did not yield a common name. This tells me that the indexer was not familiar common English language surnames. Why is ancestry.com getting Chinese companies to do this kind of indexing? I have also found a number of cases where the indexed surname was off by only one letter from the actual surname, but that the ancestry.com search algorithm could not find the record, but that it was easily found using familysearch.org.Gary

I have been a member of ancestry.com for many years, but I use other sources as well, including familysearch.org. In the past year, especially, I've noticed many, maNY, MANY errors in the transcribing and indexing on ancestry.com. Recently, researching a Joseph Hutcheson/Hutchison/Hutchinson born in Kentucky, moved to Missouri and then to Enid, OK. I thought I'd access the City Directories for Enid, OK to track the family between censuses and wow, what I found was astoundingly egregious. All you have to do to verify what I'm saying is to start with the 1905 Enid, OK City Directory and search for Joseph Hutcheson. You'll go mad trying to find a decent index for him, so you'll end up just accessing the City Directory for Enid, OK and start going up the years. You can just put in pg 50, 75, whatever, until you find the record. On the way, please do enjoy such genealogical tidbits as "Walnut Hyde" (taken from the street name and last name....don't ask me) and "Baking Soda Tea Bisquits..." taken from a store advertisement. Then, there's also "telephone 7 days a week" and "Ill" instead of an "H", or "II" as in "The Second", instead of an "H", even after women's names. Some of the most egregious errors I've ever seen! It is quite apparent that whoever is doing the transcribing is not only unfamiliar with the English language and common names, but with the English alphabet, as well. Beyond that, there is apparently NO quality control - no checking for veracity. These butchered transcriptions and indexes are sent to Ancestry.com where they're uploaded immediately - if they do random checking, then they're using aliens or people who don't speak English ...or children who can't spell...or read, to do it. When you take a look at these city directories - and please, don't stop at Enid, OK - browse through them randomly. I've looked at several in Kentucky, Missouri, Oklahoma, Ohio and elsewhere, but I'd bet that many, many more are in the same sorry state. I tried to correct as many records as I could, but in some cases there were not only erroneous entries, but many individuals that were missed entirely. On one page I counted 81 entries that SHOULD have been there, and there were less than 45 actually there...and that included "non person" entries, like part of an advertisement, street address or building name. That, and the way some letters don't seem to be understood by the transcriber, tells me that whoever is transcribing doesn't speak English well and probably is sitting in front of some kind of chart. But since these people are probably paid by the record, their managers are probably telling them to hurry, hurry, hurry. That's their incentive. And that, sadly, is the reality of what's happening to the records at Ancestry.com. The same types of errors occur in the 1940 Census, in many of the birth, marriage and death records. And when I see two men married, as I did yesterday, I KNOW that transcriber isn't familiar with English male and female names! Plus, I feel, as an American, that these huge companies that outsource work like this are unpatriotic and are contributing to the economic woes of this country. I would be ashamed to admit that I outsourced work of this kind to other countries where they pay less, but also receive less in quality of the work. When Todd Jensen says these documents are checked for quality control, I don't believe him. Either that, or he isn't doing HIS job, either. Eventually the records on Ancestry.com will be so poor that people will stop paying for subscriptions, especially at the rising rates. If these records have to be sent back and redone, that's a waste of time and money.

I too felt those foreign transcribers must be sitting in front of charts and choosing what looks most right. I would do no better if I were sitting in front of a Chinese chart, for example, and trying to choose what most looks like what I am working on. This isn't right. It just isn't. Thanks, Gary.

I think we should accept the fact that this job is and has always been hectic. First of all paleography is not a one time thing and most of the keyers try their best to key what is right just give them a score since keying over one million records is not an easy task trust me.

Subscribe via email

The Ancestry Insider

The Ancestry Insider is consistently a top ten and readers’ choice award winner. He has been an insider at both the two big genealogy organizations, FamilySearch and Ancestry.com. He was Time Magazine Man of the Year in both 1966 and 2006. And he really is descended from an Indian princess.

Dear Ancestry Insider, So, does Ancestry.com have access to the information I contribute to FamilySearch.org, like photos, stories, and Fami...

Biography

The Ancestry Insider was a readers’ choice for the top four genealogy news and resources blogs, part of Family Tree Magazine’s “40 Best Genealogy Blogs” for 2010. He reports on the two big genealogy organizations, Ancestry.com and FamilySearch. He was named a “Most Popular Genealogy Blogs” by ProGenealogists, and has received Family Tree Magazine’s “101 Best Web Sites” award every year since 2008. A genealogical technologist, the Insider has a post-graduate technology degree and holds a dozen technology patents in the United States and abroad. He has done genealogy since 1972 and has worked in the computer industry since 1978. He was Time Magazine Man of the Year in both 1966 and 2006. And he really is descended from an Indian princess.

Legal Notices

The Ancestry Insider is written independently of Ancestry.com and FamilySearch. The opinions expressed herein are those of the author, and do not necessarily reflect those of Ancestry.com or FamilySearch.

E-mails and posted messages may be republished and may be edited for content, length, and editorial style.

The Ancestry Insider may be biased by the following factors: 1) The Ancestry Insider accepts products and services free of charge for review purposes. 2) The author of the Ancestry Insider is employed by the Corporation of the President of the Church of Jesus Christ of Latter-day Saints, owner and sponsor of FamilySearch. 3) The author is a believing, practicing member of the same Church. 4) The author is a former stock-holder and employee of the business now known as Ancestry.com and maintains many friendships established while employed there. 5) It is the editorial policy of this column to be generally supportive of Ancestry.com and FamilySearch. 6) The author is an active volunteer for the National Genealogical Society.

"Ancestry Insider" does not refer to Ancestry.com. Trademarks used herein are trademarks or registered trademarks of their respective owners. The Ancestry Insider is solely responsible for any silly, comical, or satirical trademark parodies presented as such herein.

All content is copyrighted by the Ancestry Insider unless designated otherwise. For content copyrighted by the Ancestry Insider, permission is granted for non-commercial republication as long as you give credit and you link back to the original.