Category Archives: Reviews

What a fun read. It’s about technology, sure, but more about culture. Neal takes a good look at operating systems, why we get emotionally involved with them, and why Windows is still so popular. He does this with a grand detour to Disneyland, and a hefty dose of humor. The above quote was from near the end of the book, where he imagines hackers creating big bangs from the command line.

He starts out the book from some anecdotes from the early 1970s, when he had his first computer class in high school. His school didn’t have a computer, but they did have a teletype (the physical kind that used paper) with a modem link to some university’s system. But time on that system was so expensive that they couldn’t just dial in and run things interactively. The teletype had a paper tape device. You’d type your commands in advance, and it would punch them out on the tape. Then when you dial in, it would replay the tape at “high speed”.

Neal liked this because the stuff punched out of the tape were, actually, “bits” in both the literal and the mathematical sense. This, of course, led to a scene at the end of the schoolyear where a classmate dumped the bin of bits on the teacher, and Neal witnessed megabytes falling to the floor.

Although the book was written in 1999, and needs an update in some ways, it still speaks with a strong voice today — and is now also an interesting look at what computing was like 10 years ago.

He had an analogy of car dealerships to operating systems. Microsoft had the big shiny dealership selling station wagons. Their image was all wrapped up in people feeling good about their purchase — like they got something for their money. And he said that the Linux folks were selling tanks, illustrated with this exchange:

Hacker with bullhorn: “Save your money! Accept one of our free tanks! It is invulnerable, and can drive across rocks and swamps at ninety miles an hour while getting a hundred miles to the gallon!”

Prospective station wagon buyer: “I know what you say is true…but…er…I don’t know how to maintain a tank!”

Bullhorn: “You don’t know how to maintain a station wagon either!”

Buyer: “But this dealership has mechanics on staff. If something goes wrong with my station wagon, I can take a day off work, bring it here, and pay them to work on it while I sit in the waiting room for hours, listening to elevator music.”

Bullhorn: “But if you accept one of our free tanks we will send volunteers to your house to fix it for free while you sleep!”

Buyer: “Stay away from my house, you freak!”

Bullhorn: “But…”

Buyer: “Can’t you see that everyone is buying station wagons?”

That doesn’t mean that Stephenson is just a Linux apologetic. He points out that the CLI has its place, and has a true love-hate relationship with the text-based config files (remember XF86Config before the days of automatic modelines? Back when you had to get out a calculator and work some things out with pencil and paper, or else risk burning out your monitor?) He points out that some people want to just have the thing work reasonably well. They don’t want control — in fact, would gladly give it up if offered something reasonably pretty and reasonably functional.

He speaks to running Linux at times:

Sometimes when you finish working with a program and shut it down, you find that it has left behind a series of mild warnings and low-grade error messages in the command-line interface window from which you launched it. As if the software were chatting to you about how it was doing the whole time you were working with it.

Even if the application is imploding like a damaged submarine, it can still usually eke out a little S.O.S. message.

Or about booting Linux the first time, and noticing all sorts of cryptic messages on the console:

This is slightly alarming the first time you see it, but completely harmless.

I use emacs, which might be thought of as a thermonuclear word processor. . .

Microsoft Word, were devoted to features like mail merge, and the ability to embed feature-length motion pictures in corporate memoranda, were, in the case of emacs, focused with maniacal intensity on the deceptively simple-seeming problem of editing text. If you are a professional writer–i.e., if someone else is getting paid to worry about how your words are formatted and printed–emacs outshines all other editing software in approximately the same way that the noonday sun does the stars. It is not just bigger and brighter; it simply makes everything else vanish. For page layout and printing you can use TeX: a vast corpus of typesetting lore written in C and also available on the Net for free.

I love these vivid descriptions: programs secretly chatting with us, TeX being a “corpus of typesetting lore” rather than a program. Or how about this one: “Unix. . . is not so much a product as it is a painstakingly compiled oral history of the hacker subculture. It is our Gilgamesh epic.” Yes, my operating system is an oral history project, thankyouverymuch.

The book feels like a weird (but well-executed and well-written) cross between Douglas Adams and Cory Doctorow. Which makes is so indescribably awesome that I can’t help but ending this review with a few more quotes.

Because Linux is not commercial–because it is, in fact, free, as well as rather difficult to obtain, install, and operate–it does not have to maintain any pretensions as to its reliability. Consequently, it is much more reliable.

what really sold me on it [Debian] was its phenomenal bug database (http://www.debian.org/Bugs), which is a sort of interactive Doomsday Book of error, fallibility, and redemption.

It is simplicity itself. When had a problem with Debian in early January of 1997, I sent in a message describing the problem to submit@bugs.debian.org. My problem was promptly assigned a bug report number (#6518) and a severity level (the available choices being critical, grave, important, normal, fixed, and wishlist) and forwarded to mailing lists where Debian people hang out.

That should be our new slogan for bugs.debian.org: “Debian’s interactive Doomsday Book of error, fallibility, and redemption.”

Unix is hard to learn. The process of learning it is one of multiple small epiphanies. Typically you are just on the verge of inventing some necessary tool or utility when you realize that someone else has already invented it, and built it in, and this explains some odd file or directory or command that you have noticed but never really understood before.

I’ve been THERE countless times.

Note the obsessive use of abbreviations and avoidance of capital letters; this is a system invented by people to whom repetitive stress disorder is what black lung is to miners. Long names get worn down to three-letter nubbins, like stones smoothed by a river.

It is obvious, to everyone outside of the United States, that our arch-buzzwords, multiculturalism and diversity, are false fronts that are being used (in many cases unwittingly) to conceal a global trend to eradicate cultural differences. The basic tenet of multiculturalism (or “honoring diversity” or whatever you want to call it) is that people need to stop judging each other-to stop asserting (and, eventually, to stop believing ) that this is right and that is wrong, this true and that false, one thing ugly and another thing beautiful, that God exists and has this or that set of qualities.

Apparently this actually works to some degree, for police in many lands are now complaining that local arrestees are insisting on having their Miranda rights read to them, just like perps in American TV cop shows. When it’s explained to them that they are in a different country, where those rights do not exist, they become outraged. Starsky and Hutch reruns, dubbed into diverse languages, may turn out, in the long run, to be a greater force for human rights than the Declaration of Independence.

Unix has always lurked provocatively in the background of the operating system wars, like the Russian Army.

This is perhaps one of the most enlightening books I’ve ever read, and yet I feel like I’ve only grasped a small bit of its meaning. It is with that warning that I attempt this review.

I should add at the outset that this is one of those books where no matter what you expect it to be, after reading it, you will find that it wasn’t what you expected.

I heartily recommend it to everyone, from the devoutly religious to the devoutly atheistic.

Science and Scientism

Smith begins with a discussion of science and scientism. He is a forceful defender of science and of the work of scientists in general. But he is careful to separate science from scientism. Paraphrased, he defines scientism as the belief that science is the only (or the best) route to truth about everything. He points out that, through no explicit fault of scientists, scientism has become so ingrained in our modern psyche that even theologians have started thinking in terms of it.

Yet there are some pretty glaring flaws in scientism, particularly where it comes to matters of philosophy, conscience, meaning, and religion. Smith argues that the foundation of science is the controlled experiment and logical inferences derived from it. He then proceeds to make strong case that it is not possible for humans to set up a controlled experiment to either prove or disprove the existence of something “more” than our material world — a transcendence, a metaphysical reality, a spirit, a God. We, with our existence trapped in this finite world, cannot possibly hope to capture and control something so much more than us in every way: intelligence, versatility, and “finiteness”. Thus science can’t even address the question.

That hasn’t stopped people from claiming that religion is just a helpful delusion, for instance, despite not being able to prove whether it is in fact a delusion or reality.

Worldviews

Smith then asks us to indulge a moment in considering two different worldviews: one the “science-only” worldview so common these days, and the other a more traditional religious worldview with a rightful place for science. He defers supporting evidence for each for later chapters.

The science-only worldview is pretty familiar to many, and I have even heard parts of it articulated in comments left on this blog. It goes roughly like this: The universe is x billions of years old. It is, so far as we presently know, a vast expanse with mostly dead matter. Earth is the only exception, which contains some living organisms and even sentient beings, though these make up a small fraction of even the earth. This life arrived by accident through physical and biological processes, some of which are well-understood and some aren’t. In the end, the universe will again become entirely dead, as our planet will be incinerated when our sun goes nova. Or, in any case, the entire universe will eventually expire in one of various ways. This worldview suggests that it is an accident that we are here and that we have consciousness, and that our actions have no ultimate meaning because the earth will eventually be incinerated anyhow.

The traditional worldview holds the opposite: that instead of having our origins in the tiniest and simplest of building blocks, and eventually improving over time, we should more properly think of ourselves as being derived from something greater than ourselves. That greater something is part of our world, but something much bigger than it too. It does not rule out science, but neither is it something that science can ever explain. It suggests that our lives have a purpose, that our work has meaning, and that there are ultimate ends to seek.

Smith is a scholar of world religions, and draws on his considerable experience to point out that virtually all world religions, before the Enlightenment, drew essentially the same picture of our world and the “more”. He reminds us — though perhaps less effectively than Marcus Borg — that there are other ways of knowing truth besides science, and suggests that we pay attention to what the vast majority of humanity had to say about the nature of existence before a human invention started to squelch the story.

The Stories

The book is filled with personal stories (Smith spent at least a decade each researching and practicing at least four different religions), quotes, and insights. I consider it the most enlightening book on religion I have yet read. Smith has more than a passing familiarity with physics, and the physicists in the crowd will probably be delighted at his discussions of quantum mechanics and the claim that “nonlocality provides us with the first level platform since modern science arose on which scientists and theologians can continue their discussions.”

One passage reads like this:

Again I will let Henry Stapp say it: “Everything we [now] know about Nature is in accord with the idea that the fundamental process of Nature lies outside space-time, but generates events that can be located in space-time.” Stapp does not mention matter, but his phrase “space-time” implies it, for physics locks the three together.

He says that quantum theory of course can’t prove that there is a God, but that recent research seems to disprove the old notion that, given enough time, all questions will be answerable by science.

Even if you disagree with every one of Smith’s conclusions, you’ll be along for a fascinating ride through physics, biology, philosophy, and innumerable religions. One of my favorite anecdotes concerns noted physicist David Bohm (who studied under Oppenheimer and worked with Einstein, among others). He gave a lecture at one point, apparently touching on his hidden variable theories to a great extent. At its conclusion, a senior physics professor asked derivisely, “What does all this philosophy have to do with physics?” Bohm replied, “I do not make that distinction.”

How’s that for something to ponder?

The Writing

The book is fun to read, and the stories make it all the moreso.

However, it is not a light read. Houston Smith wrote this near the beginning, without any hint of irony:

The first of these differences is that Gass’s is an aristocratic book, written for the literary elite, whereas mine is as plebeian as I can render its not always simple arguments.

I can think of a few simpler ways to express that thought. In any case, it isn’t light reading, but it is accessible even if you, like me, have little formal training in philosophy, theology, or quantum physics.

Conclusion

I would do such a poor job trying to paraphrase Smith’s main points that I haven’t even really attempted to do so here. Get the book — you’ll be in for a treat.

Incidentally, I had been thinking of buying the book for awhile. What finally made me do so was an NPR story about how he helped preserve the sound of the Gyuto Monks Tantric Choir back in 1964, when he (of course) was sleeping in a monastery in the Himalayas and awoke to investigate “something transcendent” — the “holiest sound I have ever heard.”

Share this:

I finished reading David Copperfield on the Kindle a few days ago. This is a review of the novel, not the Kindle.

I’m not an English major, and so I’m not going to pretend to be one. I’m not going to discuss what themes the book touches on, what category it fits in, or generally dissect it to the point where it’s more monotonous than fun.

I read the book because I wanted to, not because I had to write a paper about it.

I must say, first of all, that this has got to be one of the best books I’ve ever read. The vivid descriptions of the characters were just fun to read. One particularly meek man was described like this: “He was so extremely conciliatory in his manner that he seemed to apologize to the very newspaper for taking the liberty of reading it.”

Some of the scenes in the novel are amazingly vivid and memorable. The hilarious and tense scene towards the end where one of the main villains is taken down was one, and of course just about every scene involving David’s aunt is too.

Dickens is a master of suspense. He does it through subtle premonitions in the book. You might not even really notice them as you’re reading. But it sure had an effect on me: I had trouble putting the book down, and stayed up later than I should have on more than one night to keep reading another chapter or three.

Like any good book, this one left me to think even after I was done reading it, and left me wanting to read it again. Right now.

There are some practical downsides to it, though. It was written in the 1850s, and some of the vocabulary and British legal, business, and monetary discussions are strange to a modern American audience. Nevertheless, with the exception of the particularly verbose Mr. Micawber, you can probably make it through without a dictionary, though one will be handy. I read it on the Kindle, which integrates a dictionary and makes it very easy to look up words. I learned that a nosegay is a bouquet of showy flowers. And that Mr. Micawber was fond of using words obsolete since the 17th century, according to the Kindle. If you remember that “pecuniary emoluments” refers to a salary, you’ll be doing OK.

The other thing that occasionally bugged me was that the narrator (David) would comment on some sort of gesture, or comment that wasn’t very direct, and then say something like, “But she didn’t need to be more explicit, because I understood the meaning perfectly.” Well, sometimes I didn’t. Though I usually figured it out after a bit. I was never quite sure if Dickens was being intentionally needling to the reader, or if an 1850s British reader would have figured out the meaning perfectly well. But that was part of the fun of it, I think.

Share this:

So I am going to do something that nobody on the Internet is doing lately: post a review of the Kindle 2 after having only used it for three days.

Shocking, yes, I know.

I had never even seen a Kindle of either model before getting the Kindle 2. I had, though, thought about getting an eInk device for some time. The $359 Kindle 2 price tag caused me significant pause, though in the end I went for it due to the 30-day return policy.

On the surface, I thought that it would be weird to have a Kindle. After all, how often am I away from the computer? And there’s a small local library a few blocks from where I work. But I had a hunch it might turn out for me like my iPod did: something that didn’t sound all that useful from reading about it, but turned out to be tremendously so after having actually used it.

Turtleback Delivery

I ordered my Kindle 2 with standard shipping, which meant that it went by FedEx Smart Post. Here is my SmartPost rant.

There are two words in “Smart Post” that are misleading. I once had an item take literally a week to make it from St. Louis to Kansas. That is, I kid you not, slower than the Pony Express ran in 1860. This time, my Kindle made it from Kentucky to Kansas in a mere five days. Oh, and it spent more than 24 hours just sitting in St. Louis.

The Device

Overall, the device is physically quite nice. It is larger and thinner than I had imagined, and the screen also is a bit smaller. It is usually easier to hold than a paperback, due to not having to prevent it from closing too far at the binding edge. The buttons are easy to press, though I could occasionally wish for them to be easier, but that’s a minor nit.

The Screen

The most important consideration for me was the screen. The eInk display as both stunningly awesome and disappointing.

This is not the kind of display you get on your computer. Or, for that matter, any other device. It isn’t backlit. It reacts to light as paper does. It can be viewed from any angle. And it consumes no power to sustain an image; only to change it. Consequently, it puts up a beautiful portrait of a famous author on the screen when it is put to sleep, and consumes no power to maintain it.

The screen’s response time isn’t anywhere near as good as you’d expect from a regular LCD. It flashes to black when you turn a page, and there is no scrolling. On the other hand, this is not really a problem. I found the page turning speed to be more than adequate, and probably faster than I’d turn the page on a real book.

The resolution of the display has the feeling of being incredible. The whole thing provides a far different, and in my eyes superior, experience to reading on an LCD or CRT screen.

My nit is the level of contrast. The background is not really a pure white, but more of a light gray. This results in a contrast level that is quite clearly poorer than that of the printed page. At first I thought this would be a serious problem, though I am growing somewhat more used to it as I read more.

Reading Experience

Overall, I’ve got to say that it is a great device. You can easily get lost in a book reading it on the Kindle. I’m reading David Copperfield for the first time, and have beat a rather rapid path through the first five chapters on the Kindle already. And that, I think, is the best thing that could be said about an ebook reader. It stays out of the way and lets you immerse yourself in your reading.

The Kindle’s smartly-integrated Oxford-American Dictionary was useful too. One thing about a novel written 150 years ago is that there are some words I just haven’t ever heard. “Nosegay,” for instance. You can move a cursor to a word to see a brief pop-up definition appear, or press Enter to see the entire entry. This is nice and so easy that I’m looking up words I wouldn’t have bothered to if I were reading the book any other way.

A nosegay, by the way, is a bouquet of showy flowers.

Buying Experience

The Kindle has a wireless modem tied to the Sprint network on it. The data charges for this, whatever they may be, are absorbed by Amazon in the cost of the device and/or the books you buy for it.

This turned out to be a very smart piece of engineering. I discovered on Amazon’s Kindle Daily Post that Random House is offering five mostly highly-rated sci-fi books for free on the Kindle for a limited time. So I went over to the page for each, and made my “purchase”. It was only a click or two, and I saw a note saying it was being delivered.

A few minutes later, I picked up the Kindle off the kitchen counter. Sure enough, my purchases were there ready to read. Impressive. This level of ease of use smells an awful lot like Apple. Actually, I think it’s surpassed them.

You can delete books from the Kindle and re-download them at any time. You can initiate that operation from either the PC or the Kindle. And you can also browse Amazon’s Kindle store directly from the device itself.

I haven’t subscribed to any magazines or newspapers, but I gather that they deliver each new issue automatically the moment it’s released by the publisher, in the middle of the night.

I pre-ordered the (free to Kindle) Cook’s Illustrated How-to-Cook Library. It makes me way happier than it should to see “This item will be auto-delivered to your Kindle on March 26” in the order status.

Free Books

Amazon’s Kindle library has a number of completely free Kindle books as well. These are mostly out-of-copyright books, probably sourced from public etext places like Project Gutenberg, and converted to the Mobipocket format that is the Kindle’s native format with a minimum of human intervention. As they are free, you can see them in Amazon’s library if you sort by price. And, of course, Amazon will transfer them to the Kindle wirelessly, and maintain a copy of them in your amazon.com account.

Unfortunately, as with free etexts in general on the Internet, the quality of these varies. I was very annoyed to find that many free etexts look like they were done on a typewriter, rather than professionally printed. They don’t use smart quotes; only the straight ones. When reading on a device that normally shows you a faithful print experience, this is jarring. And I spent an inordinate amount of time trying to find a copy of Return of Sherlock Holmes that actually had the graphic figures in Dancing Men. Ah well.

Your Own Content

Amazon operates a mail server, username@kindle.com. You can email stuff to it, and it will convert it to the Kindle format and wirelessly upload it to your kindle for a fee of $0.10. Alternatively, you can use username@free.kindle.com, which does the same thing at no charge, but emails you back a link to download the converted work to install via USB yourself.

I tried it with a number of PDFs. It rejected — about a dozen times from only my single mail message — a PDF containing graphic images only. However, it does quite well with most text-heavy PDFs — notably doing an excellent job with Return of Sherlock Holmes from bookstacks.org — the only source I found that was both beautifully typeset and preserved the original figures. Unfortunately, the PDF converter occasionally has troubles identifying what should be a paragraph, particularly in sections of novels dealing with brief dialog.

I have also sent it some HTML files to convert, which it also does a great job with.

You can also create Mobipocket files yourself and upload them directly. There is a Mobipocket creator, or you can use mobiperl if you are Windows-impaired or prefer something scriptable on Linux.

The device presents itself as a USB mass storage device, so you can see it under any OS. There’s a documents folder to put your files in. You can back it up with your regular backup tools, too. And it charges over USB.

Web Browser

I haven’t tried it much. It usually works, but seems to be completely down on occasion. It would get by in a pinch, but is not suitable for any serious work.

The guys over at XKCD seem to love it; in fact, their blog post was what finally convinced me to try the Kindle in the first place.

Final Thoughts

I’ve ordered a clip-on light and a “leather” case for the Kindle. The light, I believe, will completely resolve my contrast complaint. The leather case to protect it, of course.

I can’t really see myself returning the Kindle anymore. It’s way too much fun, and it’s making it easier to read more again.

And really, if Amazon manages to reach out to a whole generation of people and make it easy and fun for them to read again — and make a profit doing it, of course — they may move up a notch or two from being an “evil patent troll” company to a “positive social force” company. Wow, never thought I’d say that one.

Share this:

Yesterday, I posted part 1 of how to think about compression. If you haven’t read it already, take a look now, so this post makes sense.

Introduction

In the part 1 test, I compressed a 6GB tar file with various tools. This is a good test if you are writing an entire tar file to disk, or if you are writing to tape.

For part 2, I will be compressing each individual file contained in that tarball individually. This is a good test if you back up to hard disk and want quick access to your files. Quite a few tools take this approach — rdiff-backup, rdup, and backuppc are among them.

We can expect performance to be worse both in terms of size and speed for this test. The compressor tool will be executed once per file, instead of once for the entire group of files. This will magnify any startup costs in the tool. It will also reduce compression ratios, because the tools won’t have as large a data set to draw on to look for redundancy.

To add to that, we have the block size of the filesystem — 4K on most Linux systems. Any file’s actual disk consumption is always rounded up to the next multiple of 4K. So a 5-byte file takes up the same amount of space as a 3000-byte file. (This behavior is not unique to Linux.) If a compressor can’t shrink enough space out of a file to cross at least one 4K barrier, it effectively doesn’t save any disk space. On the other hand, in certain situations, saving one byte of data could free 4K of disk space.

So, for the results below, I use du to calculate disk usage, which reflects the actual amount of space consumed by files on disk.

The Tools

Based on comments in part 1, I added tests for lzop and xz to this iteration. I attempted to test pbzip2, but it would have taken 3 days to complete, so it is not included here — more on that issue below.

As before, in the “MB saved” column, higher numbers are better; in all other columns, lower numbers are better. I’m using clock seconds here on a dual-core machine. The cost column is clock seconds per MB saved.

Let’s draw some initial conclusions:

lzma -1 continues to be both faster and smaller than bzip2. lzma -2 is still smaller than bzip2, but unlike the test in part 1, is now a bit slower.

As you’ll see below, lzop ran as fast as cat. Strangely, lzop -3 produced larger output than lzop -1.

gzip -9 is probably not worth it — it saved less than 1% more space and took 42% longer.

xz -1 is not as good as lzma -1 in either way, though xz -2 is faster than lzma -2, at the cost of some storage space.

Among the tools also considered for part 1, the difference in space and time were both smaller. Across all tools, the difference in time is still far more significant than the difference in space.

The Pretty Charts

Now, let’s look at an illustration of this. As before, the sweet spot is the lower left, and the worst spot is the upper right. First, let’s look at the compression tools themselves:

At the extremely fast, but not as good compression, end is lzop. gzip is still the balanced performer, bzip2 still looks really bad, and lzma -1 is still the best high-compression performer.

Now, let’s throw cat into the mix:

Here’s something notable, that this graph makes crystal clear: lzop was just as fast as cat. In other words, it is likely that lzop was faster than the disk, and using lzop compression would be essentially free in terms of time consumed.

And finally, look at the cost:

What happened to pbzip2?

I tried the parallel bzip2 implementation just like last time, but it ran extremely slow. Interestingly, pbzip2 < notes.txt > notes.txt.bz2 took 1.002 wall seconds, but pbzip2 notes.txt finished almost instantaneously. This 1-second startup time for pbzip2 was a killer, and the test would have taken more than 3 days to complete. I killed it early and omitted it from my results. Hopefully this bug can be fixed. I didn’t expect pbzip2 to help much in this test, and perhaps even to see a slight degradation, but not like THAT.

Conclusions

As before, the difference in time was far more significant than the difference in space. By compressing files individually, we lost about 400MB (about 7%) space compared to making a tar file and then combining that. My test set contained 270,101 files.

gzip continues to be a strong all-purpose contender, posting fast compression time and respectable compression ratios. lzop is a very interesting tool, running as fast as cat and yet turning in reasonable compression — though 25% worse than gzip on its default settings. gzip -1 was almost as fast, though, and compressed better. If gzip weren’t fast enough with -6, I’d be likely to try gzip -1 before using lzop, since the gzip format is far more widely supported, and that’s important to me for backups.

These results still look troubling for bzip2. lzma -1 continued to turn in far better times and compression ratios that bzip2. Even bzip2 -1 couldn’t match the speed of lzma -1, and compressed barely better than gzip. I think bzip2 would be hard-pressed to find a comfortable niche anywhere by now.

As before, you can download my spreadsheet with all the numbers behind these charts and the table.

Share this:

Compression is with us all the time. I want to talk about general-purpose lossless compression here.

There is a lot of agonizing over compression ratios: the size of output for various sizes of input. For some situations, this is of course the single most important factor. For instance, if you’re Linus Torvalds putting your code out there for millions of people to download, the benefit of saving even a few percent of file size is well worth the cost of perhaps 50% worse compression performance. He compresses a source tarball once a month maybe, and we are all downloading it thousands of times a day.

On the other hand, when you’re doing backups, the calculation is different. Your storage media costs money, but so does your CPU. If you have a large photo collection or edit digital video, you may create 50GB of new data in a day. If you use a compression algorithm that’s too slow, your backup for one day may not complete before your backup for the next day starts. This is even more significant a problem when you consider enterprises backing up terabytes of data each day.

So I want to think of compression both in terms of resulting size and performance. Onward…

Starting Point

I started by looking at the practical compression test, which has some very useful charts. He has charted savings vs. runtime for a number of different compressors, and with the range of different settings for each.

If you look at his first chart, you’ll notice several interesting things:

gzip performance flattens at about -5 or -6, right where the manpage tells us it will, and in line with its defaults.

7za -2 (the LZMA algorithm used in 7-Zip and p7zip) is both faster and smaller than any possible bzip2 combination. 7za -3 gets much slower.

bzip2’s performance is more tightly clustered than the others, both in terms of speed and space. bzip2 -3 is about the same speed as -1, but gains some space.

All this was very interesting, but had one limitation: it applied only to the gimp source tree, which is something of a best-case scenario for compression tools.

A 6GB Test
I wanted to try something a bit more interesting. I made an uncompressed tar file of /usr on my workstation, which comes to 6GB of data. My /usr contains highly compressible data such as header files and source code, ELF binaries and libraries, already-compressed documentation files, small icons, and the like. It is a large, real-world mix of data.

In fact, every compression comparison I saw was using data sets less than 1GB in size — hardly representative of backup workloads.

Let’s start with the numbers:

Tool

MB saved

Space vs. gzip

Time vs. gzip

Cost

gzip

3398

100.00%

100.00%

0.15

bzip2

3590

92.91%

333.05%

0.48

pbzip2

3587

92.99%

183.77%

0.26

lzma -1

3641

91.01%

195.58%

0.28

lzma -2

3783

85.76%

273.83%

0.37

In the “MB saved” column, higher numbers are better; in all other columns, lower numbers are better. I’m using clock seconds here on a dual-core machine. The cost column is clock seconds per MB saved.

What does this tell us?

bzip2 can do roughly 7% better than gzip, at a cost of a compression time more than 3 times as long.

lzma -1 compresses better than bzip2 -9 in less than twice the time of gzip. That is, it is significantly faster and marginally smaller than bzip2.

lzma -2 is significantly smaller and still somewhat faster than bzip2.

pbzip2 achieves better wall clock performance, though not better CPU time performance, than bzip2 — though even then, it is only marginally better than lzma -1 on a dual-core machine.

Some Pretty Charts

First, let’s see how the time vs. size numbers look:

Like the other charts, the best area is the lower left, and worst is upper right. It’s clear we have two outliers: gzip and bzip2. And a cluster of pretty similar performers.

This view somewhat magnifies the differences, though. Let’s add cat to the mix:

And finally, look at the cost:

Conclusions

First off, the difference in time is far larger than the difference in space. We’re talking a difference of 15% at the most in terms of space, but orders of magnitude for time.

I think this pretty definitively is a death knell for bzip2. lzma -1 can achieve better compression in significantly less time, and lzma -2 can achieve significantly better compression in a little less time.

pbzip2 can help even that out in terms of clock time on multicore machines, but 7za already has a parallel LZMA implementation, and it seems only a matter of time before /usr/bin/lzma gets it too. Also, if I were to chart CPU time, the numbers would be even less kind to pbzip2 than to bzip2.

bzip2 does have some interesting properties, such as resetting everything every 900K, which could provide marginally better safety than any other compressor here — though I don’t know if lzma provides similar properties, or could.

I think a strong argument remains that gzip is most suitable for backups in the general case. lzma -1 makes a good contender when space is at more of a premium. bzip2 doesn’t seem to make a good contender at all now that we have lzma.

I have also made my spreadsheet (OpenOffice format) containing the raw numbers and charts available for those interested.

Update

Part 2 of this story is now available, which considers more compression tools, and looks at performance compressing files individually rather than the large tar file.

Share this:

Last July, I wrote about video uploading sites. Now that I’m starting to get ready to post video online, some public but a lot of it just for friends or family, I’ve taken another look. And I’m disappointed in what I see.

Youtube has made the biggest improvements since then. Now, they can handle high-definition video, an intermediate “HQ” encoding, and the standard low-bandwidth encoding. Back then, there was no HD support, and I don’t think any HQ support either.

There are two annoying things about Youtube. One is the 10 minute limit per video file, though that can be worked around. The other is the really quite terrible options for sharing non-public videos. In essence, the only way to do this is to, on each video, manually select which people you want to be able to see it. If suddenly a new person gets a Youtube account, you can’t just give them access to the entire back library of videos. What I want it to tell Youtube that all people in a certain GROUP should have access, and then I can add people to the group as needed. That’s a really quite terrible oversight.

Vimeo, on the other hand, has actually gotten worse. Back a year ago, they were an early adopter on the HD bandwagon. Now that they’ve introduced their pay accounts, the free accounts have gotten worse than before. With a free Vimeo account, you can only upload 1 HD video a week. You also get dumped in the “4-hour encoding” line, and get the low-quality encoding. Yes, it’s noticeable, and much worse than Youtube HQ, let alone Youtube HD. You have no time limit, but a 500MB upload limit per week.

The sharing options with Vimeo are about what I’d want.

blip.tv seems about the same, and I’m still avoiding them because you have to pay $100/yr to be able to keep videos non-public.

Then there’s viddler. I am not quite sure what to make of them. They seem to be, on the one hand, Linux fans with a clue. On the other hand, their site seems to be chock full of get-rich-quick and real estate scheme videos, despite a ToS that prohibits them. They allow you to upload HD videos but not view them. They have a limit of 500MB per video file, but no limits on how many files you can upload or the length of each one, and the sharing options seem good.

So I’m torn. On the one hand, it would be easy to say, “I’ll just dump everything to viddler.” On the other hand, are they going to do what Vimeo did, or worse, start overlaying ads on all my videos?

Any suggestions?

Share this:

We recently bought a Canon Vixia HG20 camcorder. The HG20 records in AVCHD format (MPEG-4 h.264) at up to 1920×1080. To get from the camcorder to a DVD (or something we can upload to the web), I need some sort of video editing software. This lets me trim out the boring bits, encode the video for DVD, etc.

Background

In addition to DVD creation and web uploading, I want the ability to burn high-definition video discs. 1920×1080 is significantly higher resolution than you get from a DVD. There are two main ways to go: a blu-ray format disc, or an AVCHD disc. A blu-ray disc has to be burned onto BD-R media, which costs about $5 each, using a blu-ray burner, which costs about $200. AVCHD discs use the same h.264 encoding that the camcorder does, meaning they have better compression and can be burned onto regular DVD+R media, fitting about 30 minutes onto a DVD. Moreover, it is possible to move AVCHD files directly from a camcorder to an AVCHD disc without re-encoding, resulting in higher quality and lower playing time. The nicer blu-ray players, such as the PS3, can play AVCHD discs.

AVCHD seems pretty clearly the direction the industry is moving. Compared to the tape-based HDV, ACVHD has higher quality with lower bitrates, better resolution, and much greater convenience. Hard disk or SD-based AVCHD camcorders are pretty competitive in terms of price by now too, often cheaper than tape-based ones.

The downside of AVCHD is that it takes more CPU power to process. Though as video tasks are often done in batch, that wouldn’t have to be a huge downside. The bigger problem is that, though all the major video editing software claims to support AVCHD, nobody really supports it well yet.

The Contenders

Back when I got my first camcorder in about 2001 — the one that I’m replacing now — you pretty much had to have a Mac to do any sort of reasonable consumer or prosumer-level video editing. We bought our first iMac back then to work with that, and it did work well with the MiniDV camera.

Today, there’s a lot more competition out there. The Mac software stack has not really maintained its lead — some would even say that it’s regressed — and the extremely high cost of buying a Mac capable of working with AVCHD, plus Final Cut Express, makes that option completely out of the question for me. It would be roughly $2500.

Nobody really supports AVCHD well yet, even on the Mac. Although most programs advertise support of “smart rendering” — a technique that lets the software merely copy unchanged footage when outputting to the same format as the input — none of them have smart rendering that actually works with AVCHD source material. Though this fact is never documented, though discussed on forums.

Another annoyance, having used Final Cut Express in the past, is that with these programs you can’t just go to the timeline and say “delete everything between 1:35 and 3:52”; you have to go in and split up clips, then select and delete them. They seem to be way too concerned about dealing with individual clips.

I briefly used Cinelerra on Linux to do some video editing. It’s a very powerful program, but geared at people that are far more immersed in video editing than I. For my needs, it didn’t have enough automation and crashed too much — and that was with MiniDV footage. It apparently does support AVCHD, but I haven’t tried it.

I’ve tried three programs and considered trying a fourth. Here are my experiences:

Ulead/Corel VideoStudio Pro X2

Commonly referenced as the “go to” program for video editing on Windows, I started with downloading the Free Trial of it from Corel. Corel claims that the free trial is full-featured all over on their website, but I could tell almost instantly that it wasn’t. I wound up buying the full version, which came to about $65 after various discounts.

I wanted to like this program. Its output options include AVCHD disc, Blu-ray disc, DVD+R, and the like. Its input options include MiniDV, AVCHD, ripping from DVD, ripping from Bluray, and just about every other format you can think of. And it heavily advertised “proxy editing”, designed to let you edit a scaled-down version of AVCHD video with a low-CPU machine, but refer back to the original high-quality footage for the output.

It didn’t pan out that way.

The biggest problem was the constant crashing. I really do mean constant. It probably crashed on me two dozen times in an hour. If you are thinking that means that it crashes pretty much as soon as I can get it re-opened, you’d be correct. Click the Play button and it hangs. Click a clip and it hangs. Do anything and it hangs.

It did seem to work better with the parts of the source that had been converted to a low-res version with Smart Proxy, though it didn’t eliminate the hangs, just reduced them. And every time I’d have to End Task, it would forget what it had already converted via Smart Proxy — even if I had recently saved the project — and have to start over from scratch.

I spent some time trying to figure out why it always thought my project was 720×480 even when it was 1920×1080, and why the project properties box didn’t even have an option for 1920×1080. After some forum searching, it turns out that the project properties box is totally irrelevant to the program. Yay for good design, anyone?

VideoStudio Pro X2 does have good output options, allowing combination of multiple source files onto a single DVD or AVCHD disc as separate titles. Unfortunately, its DVD/AVCHD rendering process also — yes — hangs more often than not.

The documentation for VideoStudio Pro X2 is of the useless variety. It’s the sort of thing that feels like it’s saying “The trim tool is for trimming your clips” without telling you what “trimming your clips” means, or making it obvious how to remove material from the middle of a clip.

The proxy editing feature isn’t what it should be either. Instead of being something that just automatically happens and Works in the background, you have to manage its queue in the foreground — and it forgets what it was doing whenever the program hangs.

On the rare occasion when pressing Play did not cause a hang, the AVCHD footage played back at about 0.5fps — far, far worse than PowerDirector manages on the same machine. Bits that had been rendered for proxy editing did appear to play at full framerate.

I have applied for a refund for this purchase from Corel under their 30-day return policy, and have already uninstalled it from my disk. What a waste.

CyberLink PowerDirector 7 Ultra

This was the second program I tried, and the one I eventually bought. Its feature set is not quite as nice as Corel’s, especially when it comes to versatility of output options. On the other hand, it feels… done. It only crashed two or three times on me — apparently that’s GOOD on Windows? Things just worked. It appears to have proxy editing support, but it is completely transparent and plays back with a decent framerate even without it. It can output to AVCHD, Bluray, and DVD, though smart rendering doesn’t work with AVCHD source material.

Its weakness compared to the Corel package is that it doesn’t have as many options for formatting these discs. You can have only one title on a disc, though you can have many chapters. You have some, but not much, control over compression parameters. The same goes for exporting files for upload to the web or saving on your disk.

The documentation is polished and useful for the basics, though not extensive.

Overall, this package works, supports all the basics I wanted from it, so I’m using it for now.

Adobe Premiere Elements 7

I downloaded the trial of this one too. I opened it up, and up popped a dialog box asking what resolution my project would be, interlacing settings, etc. I thought “YES — now that’s the kind of program I want.” As I tried out the interface, I kept thinking the same. This was a program not just for newbies, but for people that wanted a bit more control.

Until it came to the question of output. Premiere Elements 7 was the only package I looked at that had no option to burn an AVCHD disc. DVD or Blu-ray only. That’s a deal-breaker for me. There’s no excuse for a program in this price range to not support the only affordable HD disc option out there. So I didn’t investigate very much farther.

Another annoying thing is that Adobe seems to treat all of their software as a commercial. I’m a user, not an audience, dammit. I do not want to buy some photoshop.net subscription when I buy a video editing program. I do not want to see ads for stuff when I’m reading PDFs. LEAVE ME ALONE, ADOBE.

I just felt sleazy even giving them my email address, let alone installing the program on my system. I think I will feel like a better person once I reboot into Windows and wipe it off my system.

Pinnacle Studio 12

Another program that comes highly rated. But I never installed it because its “minimum system requirements” state that it needs an “Intel Core 2 Quad 2.66GHz or higher” for 1920×1080 AVCHD editing. And I have only a Core 2 Duo 2.66GHz — half the computing horsepower that it wants. And since they offer no free trial, I didn’t bother even trying it, especially since PowerDirector got by fine with my CPU.

Conclusions

This seems to be a field where we can say “all video editing software sucks; some just suck a little less.” I’m using PowerDirector for now, but all of the above programs should have new versions coming out this year, and I will be keeping a close eye to see if any of them stop being so bad.

Share this:

A few months ago, I asked for suggestions for magazines to subscribe to. I got a lot of helpful suggestions, and subscribed to three: The New Yorker, The Atlantic, and The Economist.

Today, I’m reviewing the only one of the three that I’m disappointed in, and it’s The Economist. This comes as something of a surprise, because so many people (with the exception of Bryan O’Sullivan) recommended it.

Let’s start with a quote from the issue that found its way to my mailbox this week:

A crowd of 2m or more is making its way to Washington, DC, to witness the inauguration of Mr Obama. Billions more will watch it on television. [link]

Every issue, I see this sort of thing all over. An estimate, or an opinion, presented as unquestioned fact, sometimes pretty clearly wrong or misleading. For weeks before Jan. 20, and even the day before, the widely-reported word from officials was that they had no idea what to expect, but if they had to guess, they’d say that attendance would be between 1-2 million. In the end, the best estimates have placed attendance at 1.8 million.

Would it have killed them to state that most estimates were more conservative, and to cite the source of their particular estimate? That’s all I want, really, when they do things like this.

I knew going into it that the magazine (to American eyes) essentially editorializes throughout, and I don’t have a problem with that. But it engages in over-generalization far too often — and that’s just when I catch it. This was just a quick example from the first article I read in this issue; it’s more blatant other places, but quite honestly I’m too lazy to go look some more examples up at this hour. I do remember, though, them referring to members of Obama’s cabinet as if they were certain to be, back before Obama had even announced their pick, let alone their confirmation hearings happening.

One of my first issues of The Economist had a lengthy section on the global automobile market. I learned a lot about how western companies broke into markets in Asia and South America. Or at least I think I did. I don’t know enough about that subject to catch them if they are over-generalizing again.

The end result is that I read each issue with a mix of fascination and distrust; the topics are interesting, but I can never really tell if I’m being given an accurate story. It often feels like the inside scoop, but then when I have some bit of knowledge of what the scoop is, it’s often a much murkier shade of gray than The Economist’s ever-confident prose lets on.

Don’t get me wrong; there are things about the Economist I like. But not as much as with the New Yorker or the Atlantic, so I’ll let my subscription lapse after 6 months — but keep reading it until then.

Share this:

I’ve owned two different GPSs in the past: a Garmin GPS III, and the Garmin eMap. Both are based on similar underlying firmware.

This week, I did some research on more modern GPS units and decided to buy a Garmin nuvi 500. Here’s my review.

Overview

The Garmin nuvi 500 is one of only two models in the nuvi line that are waterproof. The nuvi 500 and 550 are both designed as hybrid units: useful both in the car and outdoors. The 500 includes street-level maps (they appear to be the same quality as Google) for the entire United States, detailed topographical maps for the entire United States, and a global basemap. It also includes a microSD slot (or is it miniSD – I forget) for additional maps that you can buy — anything from marine maps to other countries.

It also includes a default POI (points of interest) database containing over 5 million points: restaurants, gas stations, parks, hospitals, you name it. Most contain an address and a phone number in additional to the coordinates. Unlike GPS units that you find built in to some cell phones, this is all stored on flash memory on the unit: no Internet connection required. The nuvi 500 is a portable yellow pages, topographical map, and incredibly detailed street atlas all in one.

Car Use

The nuvi 500 comes with a car charger and suction cup windshield mount in the box. (It doesn’t come with an AC charger, but it can charge over USB.) It also comes with — yay — a user-replaceable battery. The windshield mount is very sturdy and I am happy to have one that isn’t permanent.

In the car, the device performs admirably. I have read some other reviewers that have compared its routing to other GPSs and found that the nuvi generally picks the same route as Google, and almost always a better route than other GPSs. If you deviate from the selected route, it automatically re-calculates a new route for you. It will show you the next turn coming up, and has either 3D or flat map displays.

There’s a one-touch “where am I” feature. It displays your current coordinates, some sort of street address suitable to read to someone (“1234 S. Main” or “I-95 exit 43” type of thing), along with buttons to show you the nearest police stations and hospitals.

The unit also features multi-stop routing. You can either tell it where all you’re going in a defined order, or tell it all your stops and let it create an optimal route. This feature works, but it your stops are close by (and involve some of the same roads), it may wind up skipping some stops thinking you’ve made your first one.

The speaker announces upcoming maneuvers, though it doesn’t have a synthesizer for street names. It’s also supposed to work with Bluetooth headsets, though I haven’t tested that feature.

I found the map quality to be excellent. It seems to be on par with Google (in fact, I think both use data from Navteq). I was surprised with how many country dirt roads and side roads it knows about — and of course, regular city streets are all on there, with one-way indications and the whole lot. In fact, the quality of coverage was so good that I was surprised when it missed roads or got things wrong. Out in rural areas, or small towns, this happens from time to time: it thought that an abandoned railbed was an unnamed road and wanted me to drive on it (it was clearly not drivable), and in some other cases also thought some abandoned roads were still drivable. I don’t think this is a device issue though; it’s an underlying data issue, and everyone else probably has the same problem.

Its arrival time estimates are quite accurate, and its interface is smooth and easy.

It has some optional accessories I haven’t tried, such as real-time traffic reports from wireless sources, boating mode, etc.

Outdoor and Geocaching Mode

Our other main use for the device is outdoor hiking and geocaching. It really shines here. It has special support for the GPX files from geocaching.com. The device supports “paperless caching”. Not only can it put caches on the map, but if you download the GPX file for your caches, you’ll also get the full description, hint (behind a separate button, of course), most recent 5 logs, and the like right there on your screen. You can also log your finds on the device, and upload a file from it to geocaching.com later to log them on the site. This is an incredible time saver over my old method: printing out a bunch of maps, downloading waypoints to the eMap, taking notes, then logging things later.

I found outdoor mode not quite as refined as the auto mode, however. It kept forgetting that I wanted to use an arrow as my current position indicator (the default hiking boots for walking mode didn’t provide an accurate enough direction indication for my tastes). Finally realized that saying “don’t ask me about the configuration” was the key to getting it to remember the configuration. It sometimes took a surprisingly long time to realize we weren’t standing still any longer.

On the other hand, the quality of the GPS receiver was amazing. It even got a strong signal in my house. And I wasn’t even sitting at a window. The topographical maps are a nice addition, and the breadcrumb mode is always helpful when geocaching and hiking, too.

My natural way of holding it meant that I accidentally turned it off a few times, because I had a finger holding it right over the power button. But it powers back up and re-obtains the signal quite rapidly.

The different modes (automobile, outdoor, bicycle, and scooter) are mainly different collections of settings: 3D map or flat, what indicator to use, to navigate off-road or along roadways, to ignore one-way indications or heed them, etc.

PC link

The nuvi 500 has a USB port. Plug it into the PC, and you see its internal flash drive, vfat formatted. You can upload and download GPX files there, store photos if you like. Be careful what you mess with though, because its built-in maps and system data are also on that drive. If you have the SD card inserted, that will also be presented to your computer when you plug the device in. Garmin has some Windows software to make it easier to upload/download stuff, but I haven’t tried it.

Annoyances

As you can tell, I really like this GPS, but there are a few things about it that annoy me.

The #1 thing that annoys me isn’t actually the GPS itself, but Garmin’s website. I went to register the device online, but it wouldn’t let me because I don’t have the Windows-only browser plugin on my Linux machine, and it wanted to talk to the GPS for some reason. I went to send them a support request about that, only to discover — after I had typed in the request — that their “email us” form is broken in Firefox on all platforms. Bad show, Garmin.

The on-screen keyboard (it’s a touch-screen device with only one hard button for power) isn’t QWERTY layout; it’s alphabetical layout, and it makes me inefficient when entering data. I found myself logging my finds on the unit, but taking notes about them on paper because that was faster. Garmin has a feature listed on their website for a toggle between QWERTY and alphabetical layout, which they apparently offer on only their more expensive GPSs. What? There is no reason to not but that in all your firmware.

The device lacks a few features I was used to on my GPS III and eMap. It doesn’t support any kind of real-time position indication to the PC; all communications is just accessing stored data on the internal flash drive. I used to think that was a nice feature, but in reality, I haven’t used it in years. It also lacks the display of the exact location of each GPS satellite, though the incredible quality of its receiver means that I don’t really care any more. (I used to use that information to help figure out which window to put it by if in a car/train or something.)

It also lacks the level of configuration that was present in the settings screens on the older units. There’s no “battery saver” mode (sample every 5 seconds when going at a constant velocity instead of every 1) like the older units had. The sun and moon screen likewise is gone, but added is awareness of timezones; the Nuvi 500 can show you the local time at your present position synced up with the GPS satellites.

The compass is not the most helpful screen, though after some practice, it is functional. The documentation about it is confusing, but really the thing that was more confusing was this: in walking mode, the arrow that indicates what direction you’re walking in updates faster than the arrow that indicates what direction you should be walking in. Once I realized what was going on there, it was easier to use.

The compass does tell you what your bearing is, in degrees, but only when you are not seeking a destination. It will not tell you the bearing to the destination, though you can estimate it from the simulated compass face on the display. When seeking a specific point, especially through terrain with obstacles such as trees, it is useful to be able to use a compass for the final approach because a GPS unit can’t tell you which direction it’s pointed — only which direction it’s moving, so when you’re not moving or moving slowly, it’s not helpful.

I did some experimentation today with the compass screen, as well as my real compass, and was able to navigate to my destination rather precisely using the combination of them. That said, a more functional compass screen would still be better.

Conclusion

Overall, I’m very happy with the nuvi 500. It’s not the same as a top-of-the-line device in either the outdoor or automotive category, but on the other hand, it’s cheaper and more convenient than buying two devices with similar features. The geocaching features are excellent, the build quality is excellent as well. The system is stable and performs well. (Some other reviews worried about whether the case is solid enough; it seems quite solid to me.) I wish there were a faster way to toggle between 3D and flat map views, and forgetting about my walking mode icon is annoying, but other than that I have very little to complain about. Garmin’s geocaching features (found on this unit and several other in their lineup) is great.