Vox Pop: Workflow for the Fujitsu ScanSnap?

A couple months ago, on a MBW episode, Merlin, you recommended some scanner/pdf solutions and you said you would elaborate on that on 43f at some point. I thought this was related to reducing your reliance on paper. How did your scanning experiment go?

Adam remembers correctly that I purchased and preliminarily fiddled with the Fujitsu ScanSnap S500M for OS X (Info, Amazon). It's a small-footprint, high-speed document scanner that a lot of people have been talking about lately. I'd read so many reviews and blog posts about how easy it is to use that I was intoxicated by the dream of a life -- if not without paper storage -- where I could at least try to minimize my unnecessary paper clutter and start making document archiving easier and more searchable.

Given the not inconsiderable cost of the unit, I'm embarrassed to say that I got busy with other stuff and haven't yet returned to using the ScanSnap in any automated way.

My initial experiences, while tentative in terms of time commitment and true workflow integration, have been very positive so far. It's easy and fast to set up the S500M and then start scanning one- or two-sided documents. The beauty part is that the included "ScanSnap Manager" app not only stores your document preferences, but directs the USB input from the ScanSnap right into the destination app of your choosing (which can, of course, be an OCR app -- that's where it gets powerful).

Initial experiments scanning directly to image-only PDFs were very positive, while scanning into "Yep" and "DevonThink Pro Office" (which has on-board OCR) seems to point even closer to the direction I eventually hope to go.

I know at least a few of you are ScanSnap studs who have come up with workflows that are really happening for you (hint: looking at you for a blog post here, Mr. Norbauer). In the absence of a more detailed report from me, I'm hoping a few of you can chime in here.

The Question to You

How are you integrating the ScanSnap (or another OS X-friendly document scanner) into your workflow? What are you using for OCR? Having particular success with ReadIris, Acrobat, DevonThink, or Yep? Any sexy Automator workflows to share?

Because of my ongoing dissertation (humanities), my workflow is similar in many aspects: Wide-Screen monitor, scanning from books, Bookends. I use two scanners, a CanoScan 3200F (fast, usb 2.0), plus a ScanSnap on my desk, always waiting for input. The Scansnap eats a lot of copied articles from the library - it is not possible to borrow the majority of books here. My workflow:

1) scanning while clearing the Inbox:
(with ScanSnap app. and VueScan for my flatbed scanner)
All the files go into the folder "InScan"

2) import Scans into Devon, rename. I use two databases ("Dissertation" and Archive" for everything else.)

Why Devon? The "fuzzy search" is incredibly helpful to locate any file. OCR is never 100% accurate - not a good basis for a spotlight search. When writing a Paper (in Scrivener), I often paste portions of my text into devon and look for similar entries in the database. Results are impressive so far.

3 a) (in the evening): Batch OCR all the scanned PDFs in Devon (OCR needs a lot of memory) If you have many PDF files, search for ".pdf" in Devon and change the view of the results to display the file type, so that you can select only the files that are not yet "PDF+text", convert them and delete the unconverted files afterwards (next morning).

b) sorting the documents in DT

4) (if a Bookends entry exists:) export PDF and attach file to bookends entry. For annotating PDFs, I use pdfPen - it is cheaper than acrobat (and a bit faster?). For notetaking, I use Bookends - the annotations from PDFPen are visible in BE (annotations from Skim are not).

5) BACKUP

Since I use the ScanSnap a lot, regular backups have become a top priority. I backup with an external drive and with mozy.com. I also save the DevonThink databases to a DVD-RW every weekend: Due to their size, off-site backup is too slow for these files.

With stacks and faster preview in Leopard, organizing my semi-paperless office should become easier.

43 Folders is powered by Drupal, which rules. The site was designed and made wonderful by the astounding Chris Glass. Ben Durbin is the sine qua non and our personal consigliere. 43f’s web hosting is sponsored by A2.