Category: Computing

My dream cell phone plan comes true. The mobile provider targeted towards trendy-youth of all places has the best plan designed for mostly data usage. $25/monthly for 300 minutes, unlimited texts and unlimited data. A few days ago Virgin Mobile launched a new Android phone, the LG Optimus V (Optimus One). LG’s strategy with this phone is to sell a lot of them for a low enough price that people can afford them. Sure, Verizon Wireless has a slightly better network (Virgin Mobile uses Sprint’s network), and offers better phones but it’s hard to ignore an $1170 price difference after 2 years. Not to mention the value of not being locked into a contract.

For my entire life my filing system has been to throw papers in a “Need to file” box. My thinking is that someday I will organize and file everything away. That day has not come. So I add more boxes. Finding a specific paper isn’t efficient. I dump one box at a time on the floor and throw things back in until I find what I’m looking for. Last week I decided it’s time to try going paperless…

I bought the ScanSnap S1300 model, which is portable so I can sit it in the living room and use it from a comfy arm-chair. ScanSnap will scan the documents and use Abby FineReader to OCR, and then upload to Evernote to organize all the files.

Here’s how my workflow was looking like:

Scan 10-15 pages in seconds, wait 5 minutes for OCR to finish.

Scan 10-15 pages in seconds, wait 5 minutes for OCR to finish.

Scan 10-15 pages in seconds, wait 5 minutes for OCR to finish… this is going to take forever.

The problem is, ScanSnap won’t let you scan a new batch until Abby FineReader finishes OCRing the last batch. On newer computers that’s not a problem, but on my computer it can take 20-30 seconds to OCR a page.

What’s the point of a fast scanner if my computer is old and slow and takes forever to OCR. I debated not OCRing the documents on my computer and using Evernote to do this task, but Evernote only makes the documents searchable, it doesn’t allow you to copy and paste from OCRed documents, so I much prefer to have ScanSnap/Abby FineReader perform the OCR.

Fortunately with AppleScript there’s a way to run Abby FineReader in the background so that you can continue to scan uninterrupted while queuing up documents for OCR. Tad Harrison wrote an AppleScript to automate the ScanSnap OCR process so that it could run in the background. It works great, except that it prevented ScanSnap from automatically uploading to Evernote since it had to save scans to a folder where OSX’s folder actions would pick it up for OCR instead of sending it to Evernote.

Teaching Elijah to write code

Since I was up at 3:00am holding my son anyway, I modified the script to upload files to Evernote after they’re scanned. If you want to do the same here’s the code changes to make to Tad’s AppleScript:

-- bwb001 >>>-- code should be inserted in the ocrFile function after this line:-- logEvent("OCR file generated.")tell me to set bwbName to getSpotlightInfo for "kMDItemFSName" from posixFilePathtell application "Evernote"set note1 to create note title bwbName from file posixOcrFilePathopen note window with note1end tell-- bwb001 < <<

This works. Saturday morning I scanned in documents non-stop for about 30 minutes, then I enjoyed the weather outside while my computer spent the next several hours OCRing all of the documents and uploading them to Evernote.

Evernote works well as a document management system. It automatically OCRs any PDF or JPG file that's not been already (even OCRs handwriting in JPG files) so everything is searcheable. I don't even bother using intelligent filenames or note titles, I simply search for content within the files. For my filing system I use tags and try to just use the company name. I did upgrade my account to premium ($45/year) because of the volume I scanned. Evernote allows uploading 60MB/month for free so If I had spread it out over a few months I could have done it for free, but I wanted to be done with this project. I scanned, OCRed, and uploaded 749 pages (most double-sided), 374MB to Evernote.

Now all of our files are available from all our devices. iPhone, Android, Mac, and Windows.

Evernote Security, Backup, and VersioningAt some point you have to trust cloud services with your data if you're going to embrace the future of computing. My rule is that security doesn't have to be perfect, but that it should cost more for identity thieves or whoever else may hack into my accounts to obtain useful information than the information is worth. One thing to keep in mind is data is not stored encrypted on Evernote so that it can index everything. For sensitive documents and notes I encrypt them before uploading them on Evernote and make sure the note title contains a few key words.

I trust cloud backups for the most part. Most cloud providers can provide better backups and redundancy than I can myself. When they lose data it's big negative publicity. Evernote handles about 90% of what you would want in a backup. They of course have redundancy and maintain their own backups. If Evernote goes dark I could still use the local cache on my computer, it automatically versions files so if I overwrite something important I can revert. If my computer crashes everything is backed up in the cloud anyway.

Evernote backup weaknessWhen notes are deleted from Evernote the versioning also goes with them. Once something is deleted from the trash and all of your devices have synchronized there is no way to get it back. Could this happen? Unlikely but yes. A malicious script breaks into either your computer or Evernote account and deletes all your files, and empties the trash. Or you, or someone using your computer selects all notes, deletes them, empties the trash, and ignores the warning that the notes will be gone forever. So as one last step it's a good idea to backup the Evernote library data and database files (on Mac this is in ~/Library/Application Support/Evernote).

Any decent backup tool will create versioned backups that you can store offsite. I use JungleDisk to backup to Amazon S3, so I just made sure the Evernote folder was in my daily backup list. If for some reason you did something stupid you could always recover from your JungleDisk backup.... so long as Evernote is not the only place you store your S3 encryption keys. |:-)

Ben & Kris' new filing system: everything searchable in Evernote

Cost to go paperless

ScanSnap S1300: $246 at Amazon

Evernote Premium: $45

Shredding: ??

Now all that's left is shredding. I have a great little shredder that Bob gave me, but it can only handle a handful of papers before it has to take a rest for a day so it will either take a long time to catch up or I may see if I can find a shredding service or see how much a heavy duty shredder costs.

Like this:

After reading the chapter on taxonomy governance in Microsoft Office SharePoint Server 2007 Best Practices by Ben Curry and Bill English, I started thinking about my own personal taxonomy.Taxonomy is a division into groups or categories.1 A good example is a Library’s Dewey Decimal Classification. Tagging has become a popular means of classification in blogging and photo sharing communities. But what about the documents on my computer?

Most people keep their documents in folders. They will have a folder for each client, a folder for each class at school. Some people keep a hierarchy. An English paper is under Education -> CSUSB -> 2001 -> English -> 301. This hierarchy is how I stored my files and what also got me in trouble when I started to realize that files belonged in multiple places because it makes sense for them to be in several areas: If my brother wrote a paper on Conditional Election, should I file it under Family -> Essays -> Jon, or Religion -> Christianity -> Essays, or Religion -> Essays -> Christianity. You can see that Essays doesn’t belong under Family or Christianity and the hierarchy (in this case) is meaningless.

Metadata filesystem tagging is supposed to solve this problem.2 Instead of placing the files in folders, the theory of metadata correctly realizes that we aren’t storing hierarchical information, but descriptive data. We’re simply trying to describe the contents of a file. So I could tag the paper, “Essay, Christianity, Religion, Family. This does two things: 1) It doesn’t matter how many “tags” are given to an object (or document). The object is not duplicated. And 2) categorically searching for files is easy. This is how I’ve been organizing files the last few years.

This is a great theory, but in practice, it has failed me. First, in order for filesystem metadata to work one has to be disciplined to do it to every file. This takes time. Lots of time. Second, there are quite a few flaws in the implementation: The first I blame on Apple’s implementation because the metadata isn’t stored in the objects themselves so most backup solutions don’t back up the metadata. The second problem is tag creep. I have too many tags and I forget which ones I’ve used so I have a “money” and a “finance” tag, a “car” and an “auto” tag. If I had spent the time to develop a taxonomy this wouldn’t be an issue, but I didn’t. The thought didn’t cross my mind. Now I have a mess of inconsistent tags, half my files aren’t tagged because I don’t have time to tag them, and I’m not motivated to do so because I know when my hard drive crashes and I have to restore from backups I’ve lost all my metadata and I would have to start over.

So what do I do now? Well, now I just keep everything in the documents folder, and store any “tags” in the filename and have a workflow that pre-pends the date to the filename. I do have some high level folders that pertain more to how I got the file (or the content type) than the logical content. For example, if I scan a statement from my Schwab account it would go under the Documents -> Scan folder as “20080928 schwab statement”. I know it’s a mess but if I have to restore from backup I’ve retained the metadata in the filename.

I think tagging would be more maintainable if I developed a personal high level taxonomy (sort of a micro level taxonomy governance) instead of allowing arbitrary tag names. To see what I mean, this is how I’ve been classifying files (low level): Christianity, Automobile, Schwab, Lisp, Car, Essay, Recipe. But this is how I should classify files (high level): I would create a list of (no more than 10) high level tags like, “Religion, Transportation, Finance, Knowledge” that would have enough foresight to cover all topics and areas in the future (much like Dewey).But who has time to do that?

I don’t. So my files are a mess, and they will stay that way. But the important thing is I know how they should be organized.

Like this:

Backup Requirement

Daily full backup, transaction log backups every 15 minutes. Files are backed up to the local filesystem. Remote backup server copies changes from the server every 15 minutes. Although not an efficient use of space, it is useful to backup to the local hard drive in case a point in time restore is needed, or for restoring tests databases to other SQL Server instances.

Recoverability

If an unrecoverable failure occured on the database system the data will be recoverable to a state within 30 minutes (this is a worst case scenario if the remote backups aren’t synchronized: 15 since last transaction log and 15 minutes since last offsite backup) of failure by restoring the full backup and rolling the transaction log backups forward.

SQL Server backups

Create a Job to do a FULL backup daily. Overwrite the backup if it already exists, and it’s always a good idea to do a checksum and verify the backup:

To prevent the transaction log backup file from growing infinitly large add a second step to the first Job, so that after the FULL backup an immediate transaction log backup occurs. The only difference is you want to overwite the backup set for the transaction log:

Summary

The first Job runs daily and has two steps. The first step does a full backup of the database overwriting the previous full backup. The second step does a transaction log backup overwriting all the previous transaction log backups. The second job runs a transaction log backup, which gets appended to the previous transaction log backup every 15 minutes. While this is happening the backup server checks every 15 minutes archiving any new or changed files from the system.

Restoring

To restore the database, first restore the full backup, then restore each transaction log. Here’s an example of restoring the first four transaction logs:

Make sure the last log restore puts the database into RECOVERY mode by omitting the NORECOVERY command. The database can only have transaction logs rolled forward while in NORECOVERY mode. If you mess this up by putting the database into RECOVERY mode before the last transaction log is rolled forward, you have to start the restore process all over. So do not mess it up.

Like this:

RIM’s Blackberry and BES server, Apple’s iPhone and MobileMe service, Microsoft’s ActiveSync with Exchange all offer Push Email. That is, your phone instantly gets a notifcation of new email. I don’t get it. I don’t want my phone to tell me when I have new email. I disable push email and set it to check once an hour at most and when I feel like it I’ll read them to see if there’s anything urgent. I get tons of personal and work email throughout the day and barely have time to read them much less get interrupted every time one comes in. Sometimes I even exit Outlook while I’m at work if I need to get something done. I think most people don’t realize how much time checking email costs.

If anything I would like a phone that notifies me when I don’t get new email, because that probably means something is wrong with the mail server.

. <-- this is a dot

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 158 other subscribers

Email Address

b3n.org is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com