…from Ireland with love!

Main menu

Tag Archives: digitisation

I’ve been giving lots of time recently to thinking about the preservation of retrocomputer-related print media, such as books, manuals, etc.

These thoughts have primarily revolved around “destructive” vs. “non-destructive” digitisation of these items, and how those digitisation methods fit into the broader sphere of “preservation” in retrocomputing.

Bound items such as books introduce physical complexities to the digitisation effort as they are not readily scannable on a flatbed or sheet-fed scanner – one way to speed the process is to remove the spine, most often by way of a guillotine, leaving loose sheets which can be quickly scanned in a sheet-feed scanner. This method is used with saddle-stitched (aka “staple-bound”) publications (including magazines) as well as perfect-bound or case-bound books.

This, of course, has irreversibly altered the physical nature of the item, and is accordingly described as “destructive” scanning – its opposite, “non-destructive” scanning (appropriately), leaves the physical item intact during the scanning process.

(As a side note, it is possible with some “mechanically-bound” and saddle-stitched items to remove the binding to allow sheet-fed scanning – the binding is then replaced, restoring the item to its former state. I consider such re-binding as a non-destructive process if the item is, for all intents and purposes, returned to its original condition. It can be difficult, however, to re-create the binding as originally applied without the right equipment for the method used.)

Several years ago I destructively digitised the three editions of Lon Poole’s original Apple II User’s Guide. Once scanned, I intended to recreate the books in InDesign, replicating fonts, layout, images, etc. – a true re-creation.

Guillotining the spines off and sheet-feeding seemed the quickest and easiest way to get undistorted scans of all the pages (to be used as page templates during replication), and I used the worst-condition copy I owned of each of the editions (I had bought multiple copies of the editions for just this purpose).

As seems to invariably happen around the retrocomputing hobby, however, real life got in the way and the scans are sitting on my computer pretty much untouched, and not much re-creating has happened.

I now deeply regret guillotining even these extra, not-the-best-condition copies and believe destructive digitisation should be avoided in all but the most extreme of circumstances. If there’s no need to remove the spine, it shouldn’t be removed.

So, what’s changed in the four and a half years since I guillotined those books? Basically, scanner technology has changed, and there are now viable alternatives which allow undistorted digitising of bound print items without spine removal.

These viable alternatives do not in my view include flat-bed scanning systems such as the Zeutschel zeta scanner system. I know of a local Apple ][ enthusiast/preservationist who has had extensive experience with that system and he reports that the software deskewing/distortion removal never lived up to the promise his then employer had been sold on by Zeutschel representatives.

Those disappointing results really don’t surprise me – although such distortion removal is “only” a mathematical problem, real life is rarely as neat as mathematics would suggest. But that sort of flat-bed system isn’t the only non-destructive book scanning technology available, and I’d suggest will never work as well as the sort of system I’m thinking of.

What has changed my mind forever on destructive digitisation is exemplified by the Scribe book scanner from the Internet Archive.

Systems such as the Scribe non-destructively scan bound books while avoiding any skewing or distortion in the captured image. They do this by sitting the books in a V-shaped bed, having clear perspex or glass sheets press gently down on the pages to flatten them, and taking photos of the pages using two cameras, each mounted perpendicularly to the page they’re capturing.

The zeta system the local enthusiast had experience with cost AU$15,000, and before I saw the pricing for the Scribe I thought it would be similar – at US$13,000, it’s currently a little over the money (at today’s exchange rates, that’s AU$17,000).

However, DIY systems based on this concept are already becoming available via makerspaces (such as Robos and Dinos here in Sydney, of which I’m a member), and hobbyist versions are already available in kit form, much as kit-form 3D printers can be purchased.

At US$1,620 (including cameras), this seems like a relatively inexpensive way to go down the non-destructive digitisation path. I do appreciate, however, “relatively inexpensive” does not automatically mean “affordable”: I know I can’t afford to buy one of these scanner kits at the moment, much as I’d like to.

I’ll be demoing the Robos and Dinos book scanner at WOzFest PR#6 – my aim is to choose a title on the day (not too large, maybe 100-200 pages) and scan and post-process it throughout the event. I’m hoping to have the resulting digitised book uploaded to the Internet Archive by the time everyone leaves.

A major disadvantage of these book scanners is limited availability, which is likely to be true for some time to come. However, although these scanners are not yet as readily available as sheet feed scanners such as the Fujitsu ScanSnap, I believe print material preservation has less urgency than software preservation as books don’t suffer bit-rot like disks inevitably will.

We can afford to wait for an Internet Archive partner centre to open up here in Australia, or for a local makerspace to get such a scanner, or for a community member to make one themselves, or for a community member to be in a position to scan items in this way on behalf of the community.

A disadvantage of these scanners is the need to turn the pages manually, which increases the time to scan an item. The Robos and Dinos scanner has a counter-weighted system to hold the perspex down. This is easily lifted to turn the page, which reduces the time between scans, but this system is still not as fast as an automatic sheet-feeding scanner.

Post-processing is another area where the kit and DIY book scanners currently fall behind commercial sheet-fed scanners. They often rely on open source software for not only capturing the page scans, but also for cropping and doing other necessary adjustments to make them into easily distributable and good quality PDFs.

But, as with most areas of computing, progress is swift, and I believe there is no longer any need to remove the spines of items being digitised – they can be digitised and physically preserved, which is surely a win-win.

With the removal of the need to destructively digitise print items, I believe physical preservation of items being digitised should be as high a priority as the digitisation itself.

The strength of my belief does vary (very slightly) according to the nature of the item:

I think one-off or rare items should be physically preserved during digitisation;

I think books which are known to have several or many surviving copies are potential candidates for destructive digitisation, but I still strongly prefer all copies remain physically preserved;

I think more widely disseminated items such as user group magazines are the ones I feel least strongly about – as long as there are confirmed multiple extant copies (or they can be dismantled and re-bound as mentioned above);

I think there are some items which cannot be easily digitised either way – books with large fold out leaves, for example: a per-item judgement call would need to be made by the owner of such items and/or the community the digitised copy is meant for (NB: the Scribe system does have a large image capture accessory which I think can cater to at least some of these edge cases).

The actual condition of the item itself does not enter the equation for me – while I sacrificed the “worst condition” copies of Lon Poole’s books I owned, I still deeply regret even this “lesser” sacrifice. If I only had one copy of an item which was in poor, but still bound, condition, I would only non-destructively scan it, rather than having its spine removed just to make digitisation easier.

The Internet Archive is taking the time and spending the money to digitise and physically preserve print items – that fact alone was what got me started adjusting my attitude. Seeing the non-destructive book scanner at Robos and Dinos cemented this form of digitisation as the preferred default in my mind.

When researching others’ views for this post, I found a blog post written by Internet Archive preservationist, Jason Scott (who was one of the Skype video callers during WOzFest 5¼″). Jason makes the case that something is lost when an item is physically altered for the sake of digitisation, and that really struck home with me.

After reading that post and giving it more thought, I came to realise how much binding can tell you about an object or its producers – it’s a form of physical metadata:

Did a usergroup skimp on production costs and only use one staple?

Did user groups or software publishers who staple-bound print items guillotine it after stapling to avoid pages extending past the cover (which would speak to having extra money to spend on appearances)?

Did publishers or software houses change their binding methods according to the whim of their business performance or prospects? An example would be small software or book publishing houses moving from staple-bound to perfect-bound titles as their business grew.

Did page elements extend into the inner margin, and, if so, how carefully were the elements on facing pages made to line up (which speaks to paying printers more for such alignment and “quality assurance”)?

Of course, much of this information is secondary to the goal of digitisation and dissemination of the content of these books – but we don’t know today what will interest researchers or enthusiasts in the future.

While dissemination of information is important to a vibrant retrocomputer community, I strongly believe physical preservation of items is equally important for historical context – physical preservation along with digitisation gives the widest view of the past to future enthusiasts and researchers, and, I believe, should be a goal we all strive for.

This sort of “physical metadata” is potentially lost to future researchers if “only copies” of items have had their spines removed – and it’s sometimes hard for an owner to know if a particular edition or print run survives in only one copy, while other editions may have several surviving copies.

Is my recently acquired (and prized) early copy of the First Edition Apple II User’s Guide with apples of layered colours (as opposed to other Editions having single colour apples on the cover, see below) the only extant copy with that design? It may or may not be, but I’d not seen it before, despite a 15 year interest in that title and its variants. I wouldn’t want to damage it physically only to subsequently find out it was!

Additionally, what might pass as an acceptable scan today may be found wanting in 1, 2, 5…maybe even 10 or more years. Having undamaged physical items available for rescanning with better technology in the future allows that better technology to be utilised to its fullest extent.

On this point, I’ve noticed several preservationists have revisited their earlier scanning efforts to re-scan items at higher resolution and/or to post-process them with newer tools – it will always be better to have an unaltered original for such rescanning efforts.

Another important consideration is that while scanning technologies for non-destructive digitisation will only improve, they will also continue to get cheaper – since Jason Scott wrote the above-linked post, the Scribe system has reduced in price from US$25,000 to US$13,000, just shy of a 50% price drop in three years!

Reduced cost and improved post-processing will see non-destructive digitisation be within the reach of more and more retrocomputing enthusiasts as time goes by, and I’m hoping that destructive scanning will fall by the wayside. As far as I’m concerned, this can’t happen fast enough!