What's on Jason's Hard Drive

Several years ago I noticed something funny about my habits as a technologist. My hard drive was always immaculately organized, while my office looked like a three year old had spent the day locked inside. To help organize my papers I tried a few physical-world organization ideas--like using cubby holes to store documents (quicker inserting than vertical files)--but no matter what I tried, eventually all my documents ended up in a big pile. I honestly don't think techies are by nature motivated to keep the real world as organized as they keep their virtual world.

That led to a solution I've used now for several years: virtualize my document management! It's a system that worked great for me and I think will work for any tech-savvy person, so I'll share it here in the hope it might help you. It involves a Perforce revision control system, a document scanner, and several hard drive organizational conventions and file editing habits. If you're a techie and want your physical life more organized using virtual tools, read on.

Step One: Perforce

To start with, every important document in my personal, professional, or business life goes into a Perforce repository--everything from source code to scanned legal contracts to vacation snapshot photos. Perforce is a commercial and very high-quality revision control system. It has a price that's a bit high for basic personal organization, so I use and recommend the free version they provide that supports a max of two clients. That works out perfect for me to install the server on my Mac, place one client on my laptop, and another client on a separate hard drive on the same Mac that's the server. This setup means every file gets replicated on check-in across two machines and three hard drives.

In 1997 when I first started my virtual organization, the best free alternative to Perforce was CVS, a much more awkward RCS system. These days there's Subversion, an excellent CVS follow-on, and I might recommend someone starting fresh go with Subversion. It's open source, supports any number of clients, and has better disconnected-from-the-world behavior than Perforce. Its main downside is a more complicated server setup and a desire to consume double the disk space on the client side (annoying when storing binaries).

Step Two: Scanner

For many years I used Perforce to organize all my assets that started out electronic. Then in 2002 I bought a document scanner, and it's forever changed the way I manage paper assets. Every tax deductible receipt, every contract I've signed, every loan agreement, and basically every important document in my life has been scanned and stored as a multi-page TIFF file under my Perforce repository.

A digital library of scanned documents makes everything immediately available, even when I'm on the road, with no cabinet required--all for the price of maybe 15 minutes a week scanning. I put documents to scan on top of the scanner itself and work through them when it's late, I don't feel like working, and I don't feel like vegging. It's fun the same way ripping CDs is fun: a mindless accomplishment. After scanning, the documents get piled unceremoniously into an 8.5"x11" box (just like before but this time without any guilt!). Each box represents one calendar year. I find the 10 ream printer paper boxes work marvelously. The papers need no extra organization in the real world because they're organized online. I keep the paper copies in case of audit, at which point I'll have the motivation to pore through the stack looking for the physical document that matches the digital file.

Some advice when buying a scanner: get one with a document feeder in addition to the flat glass plate. Come April 15th when you have a long tax return to scan, you want to just push a button and let things go without intervention. If you can afford it, get one with a duplex scanning feature. Nothing's better than loading a long dual-sided contract and letting it auto-run, or so I imagine. I haven't yet splurged!

When you're scanning personal or financial documents, make sure to secure your machine. To start with, password protect your account with a strong password--letters and numbers, and using "3" for "e" doesn't count. Of course that alone isn't enough as someone with physical access to your machine can boot off another media, so setup a BIOS password also. Still, that's not a problem if the person with your machine can remove your hard drive to read your data. I like to setup a hard drive controller password too. Taken together, these passwords should make it sufficiently tricky to read your data even if your laptop is stolen. Encryption is always a good idea too, although the Windows built-in drive encryption trusts the Administrator account overly much.

Step Three: Organize the Scanned Files

Where do I keep the scanned files? Under the Perforce repository of course! Scanned documents go under /perforce/scans, under which I keep many subdirectories.

For example, /perforce/scans/financial stores bank statements, credit card statements, investment summaries, those annual FICA mailings, tax returns, etc. These days most of these documents come electronically or can be printed to PDF from a website so they don't even have to be scanned. Each financial institution or concept gets its own subdirectory like fidelity, vanguard, loans, and taxes. Within each specific subdirectory I place files with a date prefix and subject suffix. For example, 20050630-statement.tif is the June 30th statement and 20030519-form5498.tif is the Form 5498 sent in back in 2003. The year-first date makes file listing naturally appear chronological, and lets me use a suffix such as statement across many different days. The scans support multiple pages within the same file, so there's no convention needed for different pages.

Having well-organized digitized financial records proved amazingly helpful when applying for a home mortgage. Whenever the loan officer asked for a document (several years worth of tax returns, investment records, etc), I could print a quick copy. Sometimes, with less sensitive documents, I just fired off an email. My organization proved itself again when I applied for a home equity line of credit. The HELOC loan officer again asked for everything the first mortgage company did, plus details about the first mortgage and assessment records on the house. Happy me, I had those scanned and didn't have to waste an hour digging.

My receipts, in /perforce/scans/receipts, are similar to financial records, but they get their own turf. Under receipts I keep various subdirectories: personal for personal receipts (good for tracking warranties), selfempl for tax deductible self employment expenses (ready to be zipped up into an email to my accountant come January), donations for donated goods, and reimbursed for copies of expensed items (lest anyone lose my invoice). A full file path might be /perforce/scans/receipts/selfempl/2005/20050224-cell.tif. Notice how under each major receipt category there's a year-by-year subdirectory. I've found that keeps the directories from growing too long. The paper receipts go into the same 8.5"x11" box as everything else, piled right on top.

Under /perforce/scans/autos I place my auto purchase, insurance, and maintenance records, broken out by car and organized with the same date-oriented naming convention. For example, scans/autos/2004-tl/20050511-15k.tif contains the 15,000 mile service report. I've found this particularly useful at resale time as I have proof of a good maintenance record (and people think anyone with such great records must do great maintenance!). Plus it helps track when I need service and has saved me money by reminding me that I did in fact change the timing belt last year! There's no way I'd search through the paperwork to find out, but when I can PgDn through documents, I will. If you see someone in the auto shop with a laptop out paging through documents, that's probably me.

/perforce/scans/house is a directory for house related items. A tax assessment mailer might get stored as 20040815-assessment.tif. It's a good idea to have subdirectories for different properties so when you move you can ignore the records of the past location. Of course, because the organization is all virtual, if you forget to do that initially it's trivial to shuffle things around later. Reorganize without paper cuts!

Every taxonomy needs an odds and ends folder, and /perforce/scans/misc-legal is mine geared toward legal items I want to keep but that don't fit the previous categories. It holds for example a nice color scan of my passport (just in case), my drivers license, various group membership materials, and random legal contracts.

The recent addition of /perforce/scans/fun is a grab bag of stuff I want to keep for sentimental reasons, such as a scan of my wedding program. I'm sure I'll have the file long after I've lost the paper copy.

What don't I have? A way to text search my TIFF scans. I look forward to when that will be feasible (meaning low effort and cheap).