Meta

Month / December 2008

Tonight was a relatively brief night out for me to the local establishments, and I found myself finishing off the evening at a bar known for its live entertainment and animated patrons. By animated I mean youthful, I think, or wanting, or both; at the very least, responsive to the rhythms and lyrics emanating from the band, even if such animation stemmed largely from liberal consumption of alcohol. Whatever their motivation or justification, the crowd was boisterous tonight.

Nearby a man vomited at his feet, the ensuing odor trumped in short time by the stench of an absorbent material meant to contain it. Some would consider such an environment the dregs of a community; I felt quite at home. Looking around me, amidst the staggering strides and slurred sentences, I saw purpose. Foreign, yet intimate, embraces presented an outward manifestation of raw emotion: a vital, human baseness we are often instructed to avoid; the desire for acceptance, affection, acknowledgment. In a world and city suffocated by self-righteousness and starved of altruism, people strain in a conflicted, desperate search for validation.

As the band fades and lights rise, there is a great bustling for the door; a din comprised of shuffling feet and raucous laughter is gradually overcome by a final song played over the loud speakers: Ave Maria. Full of grace. With all our faults, in all our frailty, it seems that at some primal level, it is absolution from each other that we seek.

I’ve recently switched to Linux (Ubuntu 8.10) as my main operating system. I find it’s a more effective workspace for most of my tasks. Check it out if you haven’t already; Linux really is growing up. I do keep Windows around for a couple tasks, mainly gaming, but Linux is closing the gap on that, too, through the latest implementations of Wine.

One thing I’ve noticed, though, that I haven’t been able to pin down a reason for, is that PDF file sizes in Linux seem high compared to those generated in Windows. I know, this is a somewhat generic statement given the fact that, Linux or Windows, the process is dependent on the software doing the compression. Yet there seems to be a consistent discrepancy between the two operating systems when it comes to PDF file sizes. Looking around online, my observations seem to be somewhat validated. A popular solution on forums is to use the DjVu compression scheme, but I’d prefer sticking with the fairly universal PDF file format. To its credit, DjVu seems to match or better PDF when it comes to black-and-white documents, but it falls behind in grayscale.

So I ran a little test, scanning the front page of my offer letter for my new job. It consists of a company logo at the top and a full page of text. It is somewhat indicative of what I archive. All scans were done in black-and-white or grayscale. Results (file size in bytes):

Make note of the file extensions; there are actually three different file types in those listings. The file names lead with resolution, with the exception of the two starting with “CNN.” Those two were PDF’s created by printing cnn.com’s cover page to PDF in Linux and Windows (using PDF Creator). The cover page contained slightly different content but not enough to explain the file size difference. After the resolution in the file name comes the operating system, followed by compression algorithm where applicable. Immediately after the hyphen is the grayscale/black-and-white indicactor and in those cases where there is a second hyphen, it indicates the file was post-processed with a PDF printer at the stated resolution.

For Windows, where a compression algorithm is not listed, I used the software included with my Canon LiDE 50 scanner, which saves directly to PDF. In Linux, I used the popular gscan2pdf GUI. Having OCR on or off did not seem to make much of a difference, as far as file size. For gscan2pdf, the file was also processed with Unpaper, which should optimize the file further (it also creates blockiness in the document’s whitespace that is undesirable to me, but it’s fine for archiving documents).

So there you go. The difference is significant. One would have to dig into the underpinnings of the software, I think, to expose the reason for this, but I’m definitely curious. Again, DjVu pulls close and surpasses PDF when it comes to black-and-white scanning, but even it falls short when using grayscale (which happens to by my method of choice). I’ll admit I don’t relish the idea of booting into Windows simply to archive documents.