After the initial research with scanning last week, I’ve concluded that trying to scan the back issues of Northwest Runner with my home scanners is probably a job I would never finish. At 1 minute per page and some non-trivial amount of post-processing (orienting all pages properly, assembling the PDFs), the initial time to scan is just more of my life than I’m willing to dedicate to this project. I found that my home printer/scanners do offer a document feed feature, though. This works pretty well. I can put a stack of documents in the feed tray, start the scanner function, indicating that the input documents are duplex with the moire suppression option on, click “go” and come back a half hour later, flip the scanned stack to get the other side and it’s pretty much done (and the postprocessing is slightly lower, too).

The problem with that is (for my printer) it requires an ~8 1/2 x 11″ input. I tried this with one of my own back issues of the magazine after taking the staples out and cutting it down the spine and the results were great! Except for the original which I had cut in half. This is no big deal to me, but apparently the guys who actually own these magazines I offered to scan and who have been involved in this sport for about as long as I’ve been alive are not exactly thrilled at the idea that I’ll destroy all their original issues. Time for plan C…

This involves my scanners at work. The printers at my work are Ricoh Aficio MP 5000’s and with *these* I can do scans of ledger (11×17) sized inputs, with the document feed feature, and will automatically do duplex scanning (no flipping required) and they are very fast. These take about 6 minutes to scan an entire issue, front and back. This leaves me with my last problem – how to get the scanned files from the printer.

It seems the Ricoh offers two functions – both of which present some problems.

Scan and send to email – this is kind of OK. It will be inconvenient to need to pull the attachments out of hundreds of emails, but I could deal with it. The larger problem is that the generated bulk scan from an entire issue is apparently larger than my mailserver will allow. So I scan the entire issue over 6 minutes only to then have the printer tell me “sorry – couldn’t deliver your document” at which point those scans seem to be lost and I just wasted that time.

Scan and store on network share – this would be great except that the interface to get these things to talk with a Windows network share are maddeningly hard to use, might just not work, and might need some administrative rights with the printer that I don’t have. After much trial and error with this, I think that this option is closed to me.

So my likely path forward will be to scan half an issue at a time (or so – if that’s possible) and go do post processing on those. To do this, I will need to remove the staples from the back issues and feed in half an issue at a time, but I think it will work and go pretty quickly. One thing I didn’t mention is that even with this approach, it *seems* that there are characteristics of the scan job that need to be re-entered every single time I start a scan job (select input as color, set DPI, other settings, original orientation settings). Each of these is slow and tedious to input on the Ricoh touch screen and I’m hoping I can simplify it, but it might be tolerable and this will still be dramatically faster than working with my home scanner.

So – here are my next steps:

Go back with a couple of my own copies of the magazine, do some trial and error to try to understand the maximum number of pages that can be scanned and emailed in one batch without my mailserver rejecting it and get more confident that the document feed will work smoothly / flawlessly before I send any of the originals through the feeder. This will include doing that for color inputs as well as greyscale (the oldest issues are greyscale, then a single color is added on some covers, then there are full color, glossy covers over greyscale pages, and current issues are glossy and full color from back to back).

Start scanning the actual issues, probably starting from most current to oldest. This way, again, if there are any problems with the process or I hurt some issues until I’m certain this is going perfectly, I have some time to correct the process.

Start post-processing. This may take a while.

Probably use imagemagick, since I know some of its functionality

Cut the scanned images in half – I’ll have ledger sized scans.

Do some math to figure out page numbering. If the cover is page 1 and an issue is 60 numbered pages long (back cover is page 60), I should have 15 input pages and 30 scanned images (front + back). I think my picture batches will be: 1+60, 2+59, 3+58, etc. Also, if I have to do this in two batches, I will have scanned images with numbering which will need to take some of this into account, too, in a cutting and renaming script e.g. postprocess [yyyy-mm] [first_page] [last_page].

Probably also do some image rotation magic

The final output of this will be perfectly named, oriented, and numbered scans (e.g. 1998-12-p01, 1998-12-p02, 1998-12-p03, etc.)

I could deliver those back to Northwest Runner (this is what I had volunteered to do) or I might do some additional post-processing to attempt to assemble them into searchable PDFs.