Reader Story: Jump Starting The Paperless Transition (Part 1)

The other day I woke up to a pretty great email from awesome DocumentSnap reader Mike in California. Here’s what he said:

Just wanted to let you know that I took two moving boxes (roughly 40–50 pounds) of paper to the local shredding center on Monday. Thank you very much for the wonderful information on your site, which made going paperless a smooth process!

Scanned most of those papers with my ScanSnap (apart from the ones that I could download from the providers), used Hazel to file them, and I also used Arq to ensure that those files are backed up offsite (to Google Nearline in my case). Woo-hoo!

Obviously I couldn’t leave it there, so I had to ask Mike more about what he did. He agreed to share his story with you. We’ve broken the story into three parts:

My motivation for going paperless was one of desperation. When I remodeled my house, I built a home office with a customized cabinet with file drawers. That cabinet has the equivalent of two 4-drawer filing cabinets of storage, but when the remodel was completed and I put all of my files into it, I realized that there wasn’t a lot of room left over, and the volume of paper coming in was expanding, not shrinking.

My kitties, helping with the filing:

I also had fallen behind on filing, and had several foot-tall stacks of papers waiting to be dealt with. It seemed clear that sticking with paper documents was going to be unsustainable, but the initial startup for going paperless seemed incredibly daunting, especially with those big piles of un-filed paper.

Planning The Attack

I decided that what I needed to do was to focus my early paperless efforts on two fronts. I needed to start scanning and filing incoming documents (including those piles) so that the volume of stored paper would not continue to grow out of control. And, I needed to strategically scan already-filed documents to free up space in the cabinets so that I would have space for things that weren’t so easy to scan. I also realized that I needed help, so I called in the cavalry!

I made an appointment with a personal organizer with whom I had worked in the past. She came over, and the first thing we did was to conduct a purge of the filing cabinets. We identified some items that were beyond their useful life, like statements from closed accounts and so on. Those went right into the bin to be shredded.

Next, we went through those stacks of un-filed paper and sorted them first by source (water bill, cable bill, etc.) and then by date. Once we had everything sorted in that fashion, we used one of Brooks’ tips and sorted those groups according to whether they were single-sided documents, double-sided documents, or multi-page documents, so that they could be scanned using one of the ScanSnap presets that Brooks teaches[1].

Time to scan, right? Well, even with this initial organization, there was still a lot of scanning to do. Was there any way to cut it down? It turns out that there was.

Enter FileThis

Shortly before the organizer came, I had stumbled across a service called “FileThis”. They set out to solve one of the biggest problems with receiving documents and statements electronically, namely, the fact that most of these companies will not deliver those documents to you in any sort of efficient fashion.

They all want to send you an email letting you know that your statement is ready, but then you have to go log in to their (usually poorly designed) website to download a document. It’s a rare company that will just e-mail you a PDF statement. My local water utility and my gardener are the only ones so far, actually.

What FileThis does instead is to log into your accounts for you and download the statements. When you first set it up, it gets as many documents as that provider has available, up to a maximum of three years’ history. From that point on, it checks periodically, so it will get each new statement as it becomes available. (Honestly, I wish it would pull the full archive, because my bank had 7 years of history, but the FileThis folks say that they had to draw the line somewhere. I fetched the other four years from the bank manually.)

I had some trepidation about giving login credentials to FileThis, especially for things like bank statements. However, all of my banks provide the ability to create a separate, read-only login that is intended for use by accountants. I created such logins wherever possible, so that if FileThis has a security breach, at least the hackers will not be able to make changes to my accounts. There’s still some risk, though.

Using FileThis solved two big problems for me in jump-starting the paperless process. The first is that it allowed me to convert large swaths of documents to digital without scanning them! For example, after I downloaded all seven years of bank statements, the paper versions went straight from my filing cabinet and into a box to go to the shredding center. That was a stack of paper several inches thick all by itself.

The same pattern repeated with utility bills and other accounts. Some of those un-filed documents that the organizer and I sorted out were also able to go straight into the bin without scanning them, because I was able to download digital copies instead.

This was a huge boon — not only did it clear out a solid foot of space in my filing cabinet, but it also allowed me to focus my scanning efforts on documents that I couldn’t get digitally.

This pile was all destined for the scanner, but it turns out that almost all of these could be downloaded:

The second problem that FileThis solves is alluded to above: making the ongoing delivery of statements seamless, once again cutting down the amount of ongoing scanning I have to do.

FileThis has several options for what to do with the documents that they fetch for you, including storing them in their own “cloud” service. I opted instead to install their app on my Mac, which fetches the documents and saves them to my local computer. In fact, it saves them to my paperless Inbox, where Hazel promptly renames and re-files them into the appropriate places within the same structure that I use to store my scanned documents. This is basically document nirvana — completely hands-free delivery and filing of paperless documents! One thing that’s nice about getting them this way, as opposed to scanning them, is that the computer-generated PDFs don’t suffer from OCR mistakes, so Hazel’s rules get solid data to work with.

Making Progress

After the weekend spent with the organizer, I spent probably another two week of weeknights on my own (off and on) processing those sorted piles through the ScanSnap and writing (and debugging!) various Hazel rules. Once I had a solid Hazel rule for, say, my water bill and I had scanned all of the to-be-filed documents, I also pulled the folder from the filing cabinet and scanned all of the old ones that I hadn’t been able to download digitally. Thanks to the Hazel automation, this only takes a few extra minutes, and you can free up all of that space! A couple of months after starting the paperless quest, I made it through! No more piles of paper waiting to be filed! No more bursting file drawers.

The final big batch of scanning:

Papers ready to go to the shredding center:

Initial Setup = Huge Payoff

That was great already, but the really wonderful aspect didn’t become obvious for a few weeks after — how easy it is now to stay on top of the incoming stream of information. Most of it is totally automatic, delivered by FileThis and filed by Hazel. Most items that arrive on paper can just be run through the ScanSnap, and Hazel files them immediately, so the paper goes right into the shredder by my desk. For the couple of bills I get by email, I can save the PDF attachment right into my paperless inbox, and Hazel processes them as well.

It was a lot of work to set up initially, but the payoff is huge. Having such a high degree of automation in the process saves an enormous amount of time, and helps keep those piles of paper from growing again! Of course, every week I get a few documents that aren’t covered by a Hazel rule. If it’s something that I’ll be receiving on a regular basis, I’ll first look to see if it can be delivered electronically. If not, I’ll write a Hazel rule for it so I can scan them for automatic processing. But most of them are just one-offs. With all the other stuff handled by automation, scanning those and filing them manually is easy.

I should note that I still have tons of paper files. But they’re things like brochures, instruction manuals, one-off receipts, old documents from employers and so on. Sure, they could be scanned, but since they’re not suitable for automatic processing with Hazel, the effort required to name and file them would be too high to be worth it. The maximum payoff came from getting through the reams of monthly statements that the automation can handle so well. Now I have gobs of free space in my filing cabinets for the things that are less suited to scanning.

Thanks Mike! What a great writeup, and I love your point about how spending some time up front can have a big payoff later.

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply:

Dear Mike,
Could you share few of the rules that you use in Hazel?
Maybe not the “easy” one, like “if from the bank, recognize the date, rename with BANK-yyyy-mm-dd, move to folder BANK”.
But the rules that were a bit more tricky to configure.
It would be very useful.
Thanks.

Leave a Reply:

Honestly, most of my Hazel rules are about as simple as what you’ve spelled out there.

The trickiest parts to getting them to work right were figuring out what text to use to identify the document as being, say, the water bill, and also identifying which date was the document’s date.

A lot of bills have the company name represented as a fancy logo at the top that doesn’t OCR well, so that doesn’t work. And also, something like the company name will also appear in your bank statements in the transaction history, so you can’t just use the name by itself, or that rule will also fire on your bank statements.

I usually end up using a couple of distinct strings so that the recognition is reliable. If the documents are scanned, I try to keep these strings short to minimize the chance of a bad letter from the OCR. I also choose them from areas of the document where the text is large and easy to OCR. Smaller text or text that’s near or inside of a box as you see on some statements doesn’t OCR as reliably.

That number in the second “contain” represents the account number. I then rename with a filename that includes a date, so I get something like “2015-09-01 ABC Water Company.pdf”, and it gets filed.

A good trick for this is to take a scanned PDF, do a “select all”, copy, and then paste into a text-only editor like Text Wrangler. Then you can see what text is actually there from the OCR, and in what order the various dates appear in the OCR text. (This isn’t always easy to guess if the document has multi-column formatting or whatnot.)

One nice thing about using FileThis is that the files that are automatically downloaded to the Mac have names in a very regular format, so I can actually base rules on those if needed. FileThis even identifies the “relevant date” of the documents it downloads (i.e. the statement date) and puts those in the filename, so my rules for those documents can often pull the date right from the filename.

For example:
Kind is PDF
Full Name matches Bank of America CHECKING XXXXXX1234

Leave a Reply:

Thanks Mike for the tips and explanations.
I will try to do more rules. I was thinking of complexe rules, to do all in one.

And are you leaving Hazel “ON” all the time? Or just after doing some scanning?
I was wondering about the impact on the mac resources, especially if there are 10 “simple” rules, that are constantly checking the folder “inbox from scanner”.

I tried with few rules, but I can’t see if it’s eating much of resources. So for now, I start Hazel only after scanning.

Leave a Reply:

I can’t speak for Mike, but I leave Hazel on all the time and don’t have a problem with resources. The more you use it the more uses you’ll find for it beyond scanning, and it just makes sense to leave it on.

Leave a Reply:

Same here. I leave it on all the time. Since I have it set to process files in my Paperless Inbox folder, and the processing moves the files out of that folder, it has nothing to do after the processing has finished. It just seems to keep an eye on the folder (hopefully using the MacOS mechanism that allows it to receive a notification from the filesystem when the folder changes), and it doesn’t do anything until there’s something in there.

Leave a Reply:

Yes, that’s my experience as well. I can’t think of a downloaded statement that wasn’t machine-generated searchable PDF. These are great, because you get no OCR errors since the text is actually put into the PDF by the software that makes the PDF, not recognized optically.

DocumentSnap was created by Brooks Duncan (that's me). I started it in 2008 as I was going through my paperless journey. Now I share what works (and what doesn't) so you know exactly how to go paperless yourself.