The GENES Blog (GEnealogy News and EventS): Top stories concerning ancestral research in Britain, Ireland, and their diasporas, from Irish born Scottish based professional family historian, author and tutor Chris Paton. Feel free to quote from this blog, but please credit The GENES Blog if you do so. To contact me please email chrismpaton @ outlook.com.

I like the fact that the latest additions give you a break down year by year of what is being added - but this is not the case for the main listing of titles, which is something that is beginning to irritate, as I am unable to clarify exactly what it is that I am searching through without a lot of hard work.

For example, I was excited to learn that the Inverness Courier was going online, but in terms of trying to understand what is there I am having a hell of a time trying to work out what is online, and what is not. As listed in the Newspaper Titles page, when clicked on, the site suggests that at present there are issues of the Courier available from 1817-1930. When I click on it I get taken to a page that allows me to search this title alone. At this point, it then breaks down the holdings a bit further in terms of identifying the coverage, as follows, with year range and number of issues:

1800-1849 (528)

1850-1899 (1,150)

1900-1949 (313)

What this does not do though, is tell me what the gaps might be in this coverage. If I look at the second tier, for example, 1800-1849, I am told there are 528 issues. But missing from this 1800-1849 collection are any issues at all from the following periods:

1818-1823

1827-1843

There may well be a good reason for this - they may not exist, they may not yet be digitised, they may exist but not be intended for online presentation. Either way, such information would be extremely useful to know. It can be very easy to do a search for a person or event, and when no result is returned be tempted to think that the newspaper never covered that person or event - when in fact, they may well be flooding the headlines, but not in an issue available online. And if the whole collection does exist, why not work on it in one go and place each collection online at a time, instead of via a fragmented delivery schedule?

I am flagging this up merely because this is one title I am particularly interested in - I don't know how widespread the issue may be for other titles. I'd be interested to know from readers if they are experiencing similar problems. (It may well just be an Inverness issue!)

5 comments:

Can't say I'm having identical problems because I don't use the British Newspaper Archive - I just stick to FMP, despite its poor enquiry system and inability to browse.

However, what you describe is typical of the poor way that stuff is described generally. It seems that the suppliers of genealogical data think that finding stuff is all we are interested in. It never seems to occur to them that we might want to understand scope of the data searched. For instance, if I search for a baptism of person X in a county in the 1820s (say) - because that's what the census gives me - and get one hit - is that person the only one? Or is only 25% of the county covered, in which case there could be several others not yet indexed?

The really silly thing is that to an IT organisation - which most of these are - this stuff should be utterly standard, day-to-day stuff. It's called configuration management and they will know exactly what software has been loaded, what version of the software etc. But try asking them what data is there....!

Chris,I'm sorry to hear that you have had issues (no pun intended!) with the date range feature on http://www.britishnewspaperarchive.co.uk

Once a title enters the digitisation process it is near impossible to tell when exactly each individual issue will appear on the site. This is why we have the http://www.britishnewspaperarchive.co.uk/home/NewspaperTitles page which shows the new issues available on the site within the past 30 days. The 'Recently Added Issues' and 'All Titles' sections and the browse pages show a year range against each title. As you have noticed, this is an all encompassing date range and there may be gaps whilst the title is being scanned, processed and then added to the website as this is not always (actually very rarely!) date sequential. I hope this helps.

Thanks - but I think it basically just confirms what I have flagged up!

There appears to be no logical sequence for the order in which records are digitised and made available - i.e. the years selected for issues within a particular title - and no information as to when the gaps will be plugged, with those gaps themselves not clearly identified and made apparent. Without such detailed information, I cannot trust that a search I am performing is for a complete set of issues within a clearly advertised range on the site. The recent additions detail is useful, if accurate (if a title says 1927 has been added, does that mean all of 1927?), but why can't that information then be transferred to the main listing page? Within a month of it going online, that news update disappears, to be replaced with the latest update.

We know this is a ten year project, and I doubt anybody would have an issue with gaps if they are known about - Rome was not built in a day - but not to identify those gaps diminishes the credibility of the source as advertised, at least until the point when the coverage is finally complete.

Re "it is near impossible to tell when exactly each individual issue will appear on the site" - we don't want to know when stuff WILL appear. Only when it HAS appeared. Since the project clearly has configuration management in place (otherwise they are at risk of filming the same issues three times and missing others), then all we want is that the definition of the batch be uploaded somewhere visible to us, after the batch has been completed. And kept visible. That should just be a copy and paste from the config mgt system into the publicly visible progress pages as a last step in the data release process.

Two assumptions there:1. The batch definition is meaningful to the external user. (E.g. if it's "Colindale Volume 21569", we might have problems).2. The volume of text to be loaded might be an issue. Cheap and nasty will do. Plain HTML pages? Google Docs? Do NOT try to summarise it - if you do, we lose the information we need and you have extra work for no benefit. That's a lose-lose.

I share your frustration Chris. Trying to figure out if the online archive will ever have your title for the year you want is bad enough - yes I've been wrestling with NEWSPLAN again but trying to find out if it has it now......... It's a great service for those who don't always search systematically but I've now hit problems with the Nottinghamshire coverage and have resigned myself to having to go to the Central Library there and look at the microfilms. A round trip of 140 miles.