ScanSnap and Hazel Is A Match Made In Paperless Heaven

There are a lot of tricks out there for keeping your documents organized based on their location or filename, but the holy grail is to be able to keep them organized based on the actual contents of the documents themselves.

However, once you have those keywords assigned, how does that help you?

If you’re on Windows, you can use the Distribute By Keyword feature of the included ScanSnap Organizer to move the files to a cabinet, but Mac users are out of luck there.

I humbly submit that using a highlighter, OCR, and the awesomeness that is Hazel, Mac users can one-up even the mighty ScanSnap Organizer.

What Is Hazel?

For years now, I have been engaged in a torrid love affair with a Mac application known as Hazel from Noodlesoft. At a very high level, it lets you create rules to automatically keep your files organized.

I wanted to do something that would marry the searchable goodness of the ScanSnap with the ninja skills of Hazel.

Set Up The ScanSnap For Keyword Highlighting

The first thing you’ll need to do is set up a ScanSnap Manager profile to read highlighted text and make keywords out of it.

First, on the Scanning tab, I have had best luck setting the Image quality to “Best” (300dpi). At anything lower, the ScanSnap wasn’t picking up the keywords consistently.

Then on the File Option tab, make sure that “Set the marked text as a keyword for the PDF file” is checked. That will tell it to look for any highlighted text and turn it into a keyword in the PDF.

You will, of course, want to choose a folder to save the PDF to. Make a note of this folder because we will need it when we switch to Hazel. In my case it is called ToMove.

Get Out Your Highlighter

Is it Hi-liter or Highlighter? I never know. Anyways, now take your pen and highlight the word or phrase that you want to move the file based on.

Essentially what we will be doing is saying “if the PDF contains this keyword, do something with it”.

All I have handy are grocery receipts, so you can see I highlighted “EXTRA FOODS”.

Scan And Check Keywords

Now scan your document using your shiny new ScanSnap Manager profile. When it is done, open up your new PDF in Preview, go to Tools > Inspector (or hit Cmd-I), and click on the magnifying glass. If everything worked properly, you should see the text that you highlighted.

Set Hazel To Move Based On Keyword

Let’s say we want to move any PDF with the keyword “EXTRA FOODS” to a folder called Filed Documents (we’d probably want to move it to a grocery-specific folder, but let’s just pretend).

Open up Hazel and on the left side, click the Plus to add a new folder. Add your ToMove folder that you used as a scan destination in ScanSnap Manager.

Now in the right pane, click the plus to add a new rule. Give it a name.

You can set a number of criteria and rules here, but to keep it simple we will leave it as “all conditions”, then set:

Kind is PDF

Keywords contain EXTRA FOODS

Next, set it to Move the file to folder Filed Documents

Hit OK to save it. If you want to see what your rule will catch, you can click on the little Gear icon near the bottom and choose “Preview Rule Matches”. If everything is set up properly, your newly-scanned document should show there.

If it doesn’t show, check the PDF to make sure that it really has keywords and re-check your rule setup.

If your document shows in the preview, either wait for Hazel to do its thing, or click on the Hazel icon in the Menu bar, choose Run Rules, and choose the rule that you just created.

Set Hazel To Rename Based On Keyword

Let’s say that instead of moving a file based on a certain keyword, we want to give our files a name based on the highlighted text. Is this possible? Why yes, yes it is. Let’s use our new Hazel Ninja powers and do it.

Create a new Hazel rule as we did before, but this time for the criteria, set this:

Kind is PDF

Keywords is not blank

Next, in the “Do the following” section, choose “Move file” to folder “Filed Documents” (if you choose), and then set up the following:

Choose Rename file

In the with pattern section it will say “name” and then “extension”. Click on “name” and hit the delete key. We want to get rid of that.

Let’s give the filename a date. Drag “date created” up before extension. If you prefer, click the little down arrow in “date created” and choose Edit Date Pattern and change to whatever pattern you choose.

Drag “other” up between “date created” and “extension”. It will ask you to select a Spotlight Attribute. Scroll down to find Keywords and hit Select.

If you prefer, click on the little down arrow in “keywords” and change which keywords are selected and how they are formatted.

You might want to click between “date created” and “keywords” and put a dash, but that is up to you.

Your final rule should look something like this:

Now when we scan that same Extra Foods receipt, our Hazel rule will move the file to Filed Documents and rename it like this.

Forget Keywords, Use Hazel To Move Based On Searchable Text

Let’s say you want to forget about this whole highlighter/keyword thing. You already have scanned and searchable PDFs. Can’t you just move based on the OCR’ed text in the documents? Let’s find out.

So you really, really like the vegetable kale and you want to move any scanned receipt that has the word Kale in it (can you tell all I had around for this demo is grocery receipts?).

First, here is our receipt:

Next, we obviously need to be using a ScanSnap Manager profile that has “Convert to searchable PDF” checked on the File Options tab. Again you will have better results if you use 300dpi for Image quality.

Now we set up another Hazel rule, this time using the following criteria:

Kind is PDF

Contents contain Kale

Then do something with it such as move it to Filed Documents.

Now when you scan a document that has the word “Kale” in it, Hazel will move it.

Bonus: You can even have Hazel read the dates from the text of the PDF and use them in your filename. Here is how to do that.

(By the way, if you’re a Windows user, there is a similar tool called File Juggler.)

There Is A Lot You Can Do With Hazel

These were a few examples of things you can do in Hazel to be a document management ninja. Hopefully it will give you some ideas.

Remember that OCR is never 100% perfect, and the effectiveness of these rules will be dependant on the quality of the scan and OCR.

Do you have other Hazel-eriffic document tricks? Drop a comment and let us know.

Download Your ScanSnap Cheat Sheet

I've been paperless with the ScanSnap since 2008, and have collected my best tips and strategies into this free cheat sheet.

Enter your email to let me know where to send it, and I'll get it to you right away.

You'll also receive paperless tips every two weeks via the very popular Paper Cuts. I value your privacy and your information is never shared with anyone.

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply
36 comments

I blog quite often and I truly thank you for your content.
Your article has really peaked my interest. I’m going to bookmark your site and keep checking for new information about once a week.
I subscribed to your Feed too.

Leave a Reply:

Well I have tried this for almost 6 months.. I can’t get this damn thing to find any contents or Keywords. The Keyword is there because I can see it in the inspector but Hazel refuses to look inside the contents and find anything. Is there a special setting to make hazel look into a pdf and read the contents? this is so infuriating and frustrating. I have tried everything on this page and none of it works..

Leave a Reply:

Hm. I know you can see the keywords but are you sure the PDF itself is searchable? In Preview, try highlighting all the text, then go Edit > Select All, then Edit > Copy, and then in a text editor or word processor paste the text in.

Is the text you see there what you’re expecting?

If the text looks right then something weird must be going on. I’d go to the Noodlesoft Support or the NoodleSoft Forum http://www.noodlesoft.com/forums/ and the author should be able to figure out what is going on.

Leave a Reply:

Is there any way to do something similar in Windows? There must be a more efficient way of renaming the scanned images besides opening them, noting the contents, closing them, finding them again, and manually renaming them.

Leave a Reply:

Brooks,
Heard you on MPU and have been dying to get myself into the Paperless fast lane ever since. I have a Mac and now have Hazel and an iX500 but I am having some issues with Hazel executing my rules. I’ve gotten over the Hazel rules learning curve (thanks to Noodlesoft’s awesome support) but I am having a problem with OCR that is causing my rules to execute inconsistently and am wondering if you have any experience that might help me out.

I am using Hazel to look for key phrases in scanned PDF’s to use to file the scanned materials and the result of the OCR does not match the written words. For example a page that contains:
“Transaction Confirmation” when read instead has
“Transact i on Con\Ufb01rmat i on” when I look at the file using mdimport -d2 (from the terminal)

When I open the PDF in PDFPen Pro and search for Transaction Confirmation the phrase is found correctly.

FYI, I let the ScanSnap iX500 perform the OCR, although I am considering changing the work flow to allow PDFPen Pro perform the OCR if the results will be more consistent. I could also simply let PDFPen Pro perform the search instead of Hazel although I am concerned it will slow down the execution of the workflow.

Any thoughts you have would be appreciated!

Leave a Reply:

Hmm interesting Chip. If you select all the text in the document and then copy and paste it into, say, TextEdit, does it show up as “Transaction Confirmation”, or the messed up version? That’d be the first step.

Also make sure you are scanning at a decent resolution. I do 300dpi. Anything lower than that and you might not get good OCR results.

Leave a Reply:

I have a profile setup to automatically add my scan to Evernote. Evernote uses the file name as the name of the new note. I want to combine this profile with Hazel and a highlighter to change my file name before it gets to Evernote so my note’s names make more sense. Is this possible?

Leave a Reply:

I loved this write up. I'm a little behind as obviously some of the last posts were about a year ago. I just got the s1300i. I have a Mac. Is Hazel still the best option for helping to manage my filed papers?

Leave a Reply:

Unfortunately, my investigations turned up pretty much what you said – lots of OCR options, but none that can extract single pieces of marked text.

With something like a bank statement that has many dates of different transactions, what I want to be able to do is to extract just the date of the statement itself, which will be difficult without something like the process that you described in this article.

Leave a Reply:

Brook, I, like other commenters above, have a ScanSnap S300M, which lacks this functionality. Are you, or any of your other readers, familiar with any other OCR software for the Mac that can extract keywords based on highlighted text, as you have described above?

Extensive googling has turned up nothing for me.

Leave a Reply:

Sorry, I am not aware of any other software that will do the highlighter part. Any OCR software (like PDFPen) will let you OCR the document which Hazel can then act on, and you can manually give PDFs keywords using Preview which Hazel can act on, but the recognized highlighter part I am not aware of, sorry.

Leave a Reply:

If I'm LOVING Evernote ScanSnap integration, how would Hazel fit into the picture? I actually use my old (1 year old) HP windows laptop that's now my dedicated ScanSnap machine. My family workflow is:
1. Get document to scan (letter, receipt, instruction manual, etc.)
2. Flip open scanner (windows laptop is always on)
3. Load document and hit blue button
4. Throw document in trash.
Check on my Mac or phone later and see that document got scanned OK. If not, dig the document out of the trash or more likely find out that the document scanned but not to Evernote.

To find some document, just try to search Evernote for some relatively unique word. The web interface is a bit better as you don't have to wait for synchronization on the desktop app.

Leave a Reply:

I highlight the month, year and account name before scanning and have ScanSnap create keywords. Then using Hazel, Automator and some AppleScript I parse these keywords and rename the pdf file based on them — e.g., 2012-03 BofA Checking.pdf. After renaming, I have another Hazel rule move the file into the subdirectory corresponding to the account name. Works really well!

Leave a Reply:

In the "kale" example you describe above, in which you are using Hazel's ability to find the presence of a text string in the file's contents, the Hazel rule's action is to move the file to a "Filed Documents" folder. Is there a way for Hazel to pass the value of the "contents" variable entered manually in Hazel, to the Hazel command Rename File? Or alternatively pass it to "Sort file into subfolder"? The idea is to use that content text string to rename either the file or the subfolder, respectively. After repeated experimentation, as best I can determine, Hazel seems unable to do this. I have also tried using the value manually inputted with the "Text Content" variable (selected from Spotlight using Hazel's "Other" variable) in both the "conditions" and the "action" parts of the Hazel rule. Still no soap. Any ideas? Is this the hard reality of how these things (ie., Spotlight metadata and Hazel) "play" with each other? If so, then as an old TV comedian used to say, "Whadda ravoltin' duhvelupmunt!"

Leave a Reply:

You can use keywords that you selected via a highlighter in the file name with "other" it's name is keywords. But what you describe I don't believe can be done in the way you want it too. But I most likely can be done if you use applescript or Automator if you think outside of the box.

I got hazel to tag my PDF on import to Evernote using said highlighter method with both AppleScript and automator. I have a post on the hazel forums about it right now, it might give you some ideas.

Leave a Reply:

I have S1300 with OS X 10.6.4 and most recent ScanSnap software. Often, when using the "Set the market text a a keyword", the keywords appear when the PDF is viewed with Acrobat, but are not shown with Preview. Any idea why this is happening?

Leave a Reply:

Running Version 3.0 L20 and the "Set the marked text as a keyword for the PDF file" is greyed out. I have been using Profiles for quite some time now and do not have the Quick Menu turned on. Interestingly, the ONLY way I can get the "Set the marked text" option to not be greyed out is to turn Quick Menu on. I have the profile set to black and white and still no joy. Any ideas?

Leave a Reply:

Hi Finis, I think it might have to do with your profile color setting. Here's what the help says:

This checkbox can be selected only under the following conditions:
[PDF(*.pdf)] is selected in the [File format] pop-up menu.
[Auto Color Detection] or [Color] is specified for [Color mode] in the [Scanning] tab.
ScanSnap S1500 / S1500M / S1300 / S510M is connected.

So, I think you need to be using Auto or Color for your Color mode. Give that a try?

Leave a Reply:

Hi. I'm trying to get this to work and having no luck populating the keywords when scanning. I'm using the Scansnap S1500m, a yellow highlighter and then scansnap settings you suggest and its not picking up the highlighted text. I wondered if you had any ideas where I'm tripping up.
Many thanks

Leave a Reply:

Hi Mark, this might sound strange but have you tried a different highlighter? I have heard that green ones work best. If not, the ScanSnap Manager help has a pretty comprehensive list of things to try. I copied the relevant parts to a PDF if you prefer: http://cache.documentsnap.com/docs/keywordhelp.pd….

Leave a Reply:

Appreciate your reply BrooksD but I've tried a couple of highlighters, including a green one that was recommended. Still no luck. Its as if it just doesn't have the recognize highlighted section ticked, even though it does. Running out of ideas as to why.

DocumentSnap was created by Brooks Duncan (that's me). I started it in 2008 as I was going through my paperless journey. Now I share what works (and what doesn't) so you know exactly how to go paperless yourself.