Monday, 21 September 2009

On Sunday, in the final session of the annual British Isles Family History Society of Greater Ottawa conference, I had the privilege of being a panelist, with Bryan Cook, Colleen Fitzpatrick and Glenn Wright, in a session on the future of genealogy. Here are my opening remarks:

Most of us start out our family history investigation using the information we have ourselves, orally from our family, from documents and family artifacts. It’s a surprise to me that some people don't go much beyond that and still manage to develop interesting family stories. They use their family network, increasingly with the help of Internet cousins.

It doesn't take long for most researchers to move beyond, to government records. Recent years have seen that becoming easier with the tremendous strides in the ability to access these resources using indexes and transcriptions on the Internet, often linked to images of original records. Now you can pretty well get access to all civil registration and census information, at least that not subject to privacy restrictions, either the complete record or an index online. It's wonderful progress.

In addition you can access records online which were only talked about in somewhat hushed tones when I started. They were unindexed - difficult sources to deal with. I'm thinking of the US St Alban`s border crossing records and ships passenger lists. Plenty more are being tackled or remain to be tackled: land records, probate, military, criminal, poor law, burial. Then there are religious records. There’s plenty of work for indexing volunteers and indexing companies in China, India and Sri Lanka to be getting on with.

But human indexing in 19th, and online availability 20th, century technology. What is exciting me these days is machine transcription being used in book and newspaper digitization projects.

Perhaps you saw my article in the latest Anglo Celtic Roots where I wrote about finding my great-grandfather being convicted in 1879 of embezzling 30 pounds from his employer, a bank on London’s Oxford Street, where he'd worked for 10 years. The newspaper article said his salary was 155 pounds a year, he was recently married, and had a first child on the way. Despite his lapse his employer asked that in view of his good employment record he be dealt with leniently.

My information was from a digitized newspaper in the 19th Century British Newspaper Library project. I’d never have found it if the newspaper hadn’t been digitized and word-searchable. Although it's not exactly information I would have chosen to find about one of my ancestors it did turn a man who I previously knew only from a series of government records into someone I got a more visceral feeling for. That's the stuff of family history, and not the sort of thing you’re likely to learn from preciously conserved family-held records -- that type of record is usually deliberately forgotten!

If civil registration and census online was wonderful progress, digitized newspapers are, or will be, a revolution. No human intervention was involved for that newspaper – that’s a 21st century revolution.

The magnitude of achievement these newspaper digitization projects represent isn’t well appreciated. The 19th Century British Newspaper Library project comprises two million pages, containing 10 billion words or 50 billion letters.

Compare it to the rightly much heralded Human Genome project. The human genome contains three billion letters, and only four unique ones, A C G T. The British Library Newspaper Project is much richer, 50 billion letters, perhaps 94 of them unique if you count lower and upper cases, numerals and symbols, and they’re in different fonts and cases.

True, much of a newspaper is junk for your family history, but that’s just like DNA most of which is not ancestry informative.

With such richness I feel confident that extending the digitization of information sources, especially newspapers, through machine interpretation of text will provide the biggest breakthrough for family history in the next few years. Perhaps we will also see progress on machine transcription of handwritten documents. For those who doubt it just look at the improvement on the quality of the machine interpretation of newspaper text during this decade.

Here in Ottawa we’re particularly backward, none of our major newspapers have been digitized. As perhaps some of the biggest beneficiaries of newspaper digitization we, the family history community, should be vocal and active in promoting newspaper digitization.