I’ve been doing a fair amount of experimentation on my machines. Been playing with XBMC, Boxee, Ubuntu, and bleeding-edge Xfce and Freedesktop software stacks on Gentoo.

HTPC

First, I tinkered with the HTPC. It’s a Zotac MAG, dual-core Atom 330 with nVidia ION graphics. It’s just the thing for at 1080p media center. It was running an experimental community-built version of XBMC, installed back in January or February. While it was nice because it was extremely minimal, with just enough operating system to boot straight into XBMC, it was very buggy, and definitely alpha-quality.

So I wiped the disk and installed straight-up Ubuntu 10.04.1, and then installed XBMC and Boxee. I’m still tinkering with it; I need to setup the remote control I just bought, and I need to configure it to load a media center on boot, rather than the Gnome desktop. There have been a few hardware growing pains, mostly related to getting HDMI sound working correctly, and forcing 1080p output — detected my TV as a 720p device, so I had to fiddle with the nVidia utility to fix that. Also had to do a lot of tweaking in XBMC and especially Boxee to output sound over HDMI.

I’m somewhat familiar with XBMC, but I’ve wanted to try Boxee for awhile now. It’s supposed to be more user-friendly than XBMC, offering a simplified interface based on the XBMC code. It also has that whole “social media” aspect, whatever that’s about. Its real attraction is the user-friendliness; I hate having to do so much manual configuration in XBMC. I need something that my wife and I can just pick up and use; no hassle, no tinkering.

Initial XBMC impressions

It works somewhat better than the version I used several months ago. However, most of the video plugins are buggy and/or completely broken, as are some of the helper programs and utilities. All that’s supposed to change, though, with the next XBMC release. Supposedly there’s an entirely new architecture, so maybe I’ll finally get to watch my favorite shows and stream local media a bit easier, with better content scraper integration.

There are some things that don’t work — nothing on SyFy, and all too often the existing network TV plugins don’t work for all advertised shows. Mythbusters in particular is very buggy, offering only partial listings for 5 or 6 seasons, and those are extremely low-quality Flash streamed from some third-party download site. Still, my wife can watch her crime shows and 80s favorites, while I can get a fair amount of Mythbusters and other Discovery shows.

Initial Boxee impressions

Useless. Slow. Hard to configure. Totally not what I thought it’d be like. Boxee didn’t offer any of the easy internet TV watching I thought it’d have, instead offering a couple hundred useless channels for things I’ve never heard of. I expected it to have better integration with regular network TV, the same ones I can watch in a web browser. I expected a nice presentation of content, with the video wrapped in a full-screen Boxee experience. I expected hi-def streaming content; instead, the few shows I’m interested in are only available in stuttering low-quality Flash. Why bother, when I can watch The Guild in gorgeous 1080p on my Xbox?

At least XBMC has somewhat working plugins for several networks. Apparently XBMC plugins don’t work that well (or at all) on Boxee. I gave up trying to make ‘em work. All too often the shows are streamed from stuttering, blocky Flash videos. I tried using the integrated Boxee web browser to watch things like Hulu, but that turned out to be an even slower, laggier disaster.

On the hardware side, it was harder to get HDMI sound working on Boxee than on XBMC. And even though I have all the right VDPAU libraries installed, and I’m using the latest nVidia driver (195.x), nothing on Boxee seems to be accelerated. Even 720p content streamed from my LAN stutters, with the audio occasionally lagging the video. XBMC doesn’t have this problem, so I know it’s not the graphics stack.

Boxee is definitely beta. Maybe everything will work fantastically on the Boxee Box, but I don’t want to spend another $200+ dollars on essentially the exact same hardware. I want to like Boxee, I really do. But at this point, XBMC works where Boxee fails, and even though its user interface is more cumbersome, it still lets me watch videos and play music.

Initial Ubuntu impressions

Slick. Very slick. Installation from a USB key went very well. While I tried to do everything from the commandline in Gentoo, using syslinux and whatnot, none of the Ubuntu guides on creating LiveUSB media worked. I ended up just compiling Unetbootin and Qt4 on my Xfce laptop. Unetbootin worked perfectly on the first time, giving me a bootable 4GB USB stick loaded with Ubuntu 10.04.1.

Installation was simple and straightforward, with minimal user interaction. I did a bit more, because I wanted to create a custom partition layout, but otherwise the end-user part of the process was done in just a minute. The rest of the install proceeded automatically, booting me into a bright, shiny Gnome desktop. After a minute or so a few notifications popped up, advising me to install the proprietary nVidia driver and install some software updates. That took just a couple of clicks. Sweet! Oh, the joys of binary package management.

Even though the HTPC only runs a lowly 1.6ghz dual-core Atom chip with a mere 2GB RAM, the desktop still feels pretty responsive. Firefox starts up as fast or faster than my 1.5ghz Core2Duo laptop with twice the RAM. In my experience, Gnome is always pretty heavy, feeling fairly clunky and somewhat slow even when backed by speedy CPUs and gobs of RAM. Nautilus and Evolution windows always seem to load much slower than Thunar and Claws Mail, or even PCManFM and Thunderbird. Still, despite the anemic hardware, my Gnome experience in Ubuntu is surprisingly pleasant. Nothing seems particularly slow to start on a fresh login. The boot process itself could be faster, though; it takes more’n'a minute to get logged in. I need to cut that down to 10 seconds or less to get a true HTPC “instant-on” experience.

There are some quirks in most ION devices related to suspend and USB wakeups, and my Zotac MAG is no exception. I need to do some commandline hacking to get the computer to suspend every time, and to wake only when a button is pushed on the remote. And configuring LIRC is a whole ‘nother deal.

But still, I’m liking what I see. Ubuntu 10.10 will be released next month, and it promises even more improvements and nifty app integration than 10.04.

Laptop

In fact, I liked my brief experience with Ubuntu 10.04 enough to download a daily LiveCD beta of 10.10, “Maverick Meerkat.” I plan to create a LiveUSB and install it on my “Linux playground” partition. I’ve gotten just a taste of how Ubuntu works when using it as a special-purpose media center; now I want to see how it works as a mobile desktop OS.

The last time I had Ubuntu on my laptop, it was Ubuntu Studio 8.04 a few years ago. It lacked polish, lacked the cohesive desktop experience Ubuntu is known for. Its sole advantages were that it was optimized for media production, featuring a low-latency kernel and tons of preinstalled music software, with easy access to much more. I experienced numerous issues with JACK and my USB-to-MIDI adapter, though, so my plans for tinkering with music were shelved.

I’ll see how vanilla Ubuntu works on my laptop, and if it goes well, I may look into converting it into a low-latency/realtime audio production environment. I’m very interested in being able to quickly, easily use this machine to create tunes and link up with my piano. There’s been a lot of progress in the Linux audio world in two years.

Gentoo

I turned my Gentoo install into a bleeding-edge hardmasked/~arch/stable Xfce testbed. I decided to dump HAL and setup PolicyKit, ConsoleKit, udisks, upower, udev, that whole stack. Originally I wanted just to try out the experimental PackageKit features for Portage, which was a recent GSoC project. I knew that would require most of the aforementioned software stack, so I thought, “As long as it has to be installed anyway, why not dump HAL, too?”

One thing led to another, and pretty soon I had upgraded to Xserver 1.9, disabled HAL, rebuilt world a few times for USE flag changes, and pretty well screwed my system. Reconfiguring my input devices for xorg.conf.d took awhile, and I’ve been besieged by other difficulties. That’s the problem with a source-based distro, and the problem of running Xfce in particular: nothing is especially integrated, and in a lighter environment like Xfce (compared to Gnome), there isn’t much code that’s designed for a HAL-less system. Apps written for *Kit/udisks/upower are either only available in git, or still unported. A binary distro like Ubuntu would integrate all that stuff forcibly, by writing their own code if need be, so that various actions that require PolicyKit authentication would pop-up windows, prompting for passwords. None of that happens on my Gentoo system.

I ended up adding the Xfce overlay and adding several hardmasked 4.7/live versions of packages just to get udisks/upower support, which helped some. However, several bits of Xfce and other daily applications just don’t have the right code yet. So xfdesktop doesn’t display icons when drives are plugged, cameras won’t always mount and make their photos available, and not all power management options work, even when the user has the right PolicyKit credentials. That’s on top of having to reconfigure pretty much my entire working environment due to changes between 4.6 and 4.7 — everything from panel applets and their configs, to Thunar and window manager preferences. Despite USE="sound" installing libcanberra for event support, and adding sound-theme-freedesktop, event sounds cannot be enabled in the appropriate dialog, because Xfce doesn’t believe libcanberra is installed. It’s the same for Pidgin — it doesn’t believe there’s a working sound framework, either. I have to give it commands like “aplay /some/dir/foo.wav” for each kind of event.

The whole thing, is, quite frankly, a bloody mess. And all because I wanted to get some experimental Portage toys from an overlay. It’s my fault, I admit, and even after 8 hours of hacking at it, I’m probably nowhere near finished, assuming it’s even possible to get all the bleeding-edge pieces to play nicely together.

(Side note: I would like to thank my fellow Gentoo developer Samuli for taking the time to answer my numerous Xfce-related questions and do a bit of troubleshooting along the way. Thanks, man!)

So, what’s to learn from my adventures? First, trying to put everything back to its previous state would take another two days of work, if it’s even possible! Second, the price for living on the bleeding edge of *Kit integration is too high. I don’t mind running the occasional ~arch package, or using git X11 driver stacks like xf86-video-*, Mesa, and libdrm. But basic hardware abstraction stuff is nothing to be fooling around with. Not being able to use pluggable devices, enable Bluetooth, or properly adjust power management on a laptop is too high a price for being forward-thinking. I wish I’d stayed with my boring HAL system. While deprecated and a pain to configure, at least it worked reliably. More important, maintenance was nonexistent; it was simply a matter of copying several .fdi files into the right directory when I first compiled and installed my desktop. After that, I didn’t have to touch a thing.

The HAL-less desktop is supposed to be the future for every Linux distribution out there. I can only hope it is still some ways off, to give upstream coders more time to get their applications in order, so that distributions don’t have to do much patching or extensive repackaging and integration, and so that the end-users don’t have to spend hours configuring everything to their liking.

I've been doing a fair amount of experimentation on my machines. Been playing with XBMC, Boxee, Ubuntu, and bleeding-edge Xfce and Freedesktop software stacks on Gentoo.

HTPC

First, I tinkered with the HTPC. It's a Zotac MAG, dual-core Atom 330 with nVidia ION graphics. It's just the thing for at 1080p media center. It was running an experimental community-built version of XBMC, installed back in January or February. While it was nice because it was extremely minimal, with just enough operating system to boot straight into XBMC, it was very buggy, and definitely alpha-quality.

So I wiped the disk and installed straight-up Ubuntu 10.04.1, and then installed XBMC and Boxee. I'm still tinkering with it; I need to setup the remote control I just bought, and I need to configure it to load a media center on boot, rather than the Gnome desktop. There have been a few hardware growing pains, mostly related to getting HDMI sound working correctly, and forcing 1080p output -- detected my TV as a 720p device, so I had to fiddle with the nVidia utility to fix that. Also had to do a lot of tweaking in XBMC and especially Boxee to output sound over HDMI.

I'm somewhat familiar with XBMC, but I've wanted to try Boxee for awhile now. It's supposed to be more user-friendly than XBMC, offering a simplified interface based on the XBMC code. It also has that whole "social media" aspect, whatever that's about. Its real attraction is the user-friendliness; I hate having to do so much manual configuration in XBMC. I need something that my wife and I can just pick up and use; no hassle, no tinkering.

Initial XBMC impressions

It works somewhat better than the version I used several months ago. However, most of the video plugins are buggy and/or completely broken, as are some of the helper programs and utilities. All that's supposed to change, though, with the next XBMC release. Supposedly there's an entirely new architecture, so maybe I'll finally get to watch my favorite shows and stream local media a bit easier, with better content scraper integration.

There are some things that don't work -- nothing on SyFy, and all too often the existing network TV plugins don't work for all advertised shows. Mythbusters in particular is very buggy, offering only partial listings for 5 or 6 seasons, and those are extremely low-quality Flash streamed from some third-party download site. Still, my wife can watch her crime shows and 80s favorites, while I can get a fair amount of Mythbusters and other Discovery shows.

Initial Boxee impressions

Useless. Slow. Hard to configure. Totally not what I thought it'd be like. Boxee didn't offer any of the easy internet TV watching I thought it'd have, instead offering a couple hundred useless channels for things I've never heard of. I expected it to have better integration with regular network TV, the same ones I can watch in a web browser. I expected a nice presentation of content, with the video wrapped in a full-screen Boxee experience. I expected hi-def streaming content; instead, the few shows I'm interested in are only available in stuttering low-quality Flash. Why bother, when I can watch The Guild in gorgeous 1080p on my Xbox?

At least XBMC has somewhat working plugins for several networks. Apparently XBMC plugins don't work that well (or at all) on Boxee. I gave up trying to make 'em work. All too often the shows are streamed from stuttering, blocky Flash videos. I tried using the integrated Boxee web browser to watch things like Hulu, but that turned out to be an even slower, laggier disaster.

On the hardware side, it was harder to get HDMI sound working on Boxee than on XBMC. And even though I have all the right VDPAU libraries installed, and I'm using the latest nVidia driver (195.x), nothing on Boxee seems to be accelerated. Even 720p content streamed from my LAN stutters, with the audio occasionally lagging the video. XBMC doesn't have this problem, so I know it's not the graphics stack.

Boxee is definitely beta. Maybe everything will work fantastically on the Boxee Box, but I don't want to spend another $200+ dollars on essentially the exact same hardware. I want to like Boxee, I really do. But at this point, XBMC works where Boxee fails, and even though its user interface is more cumbersome, it still lets me watch videos and play music.

Initial Ubuntu impressions

Slick. Very slick. Installation from a USB key went very well. While I tried to do everything from the commandline in Gentoo, using syslinux and whatnot, none of the Ubuntu guides on creating LiveUSB media worked. I ended up just compiling Unetbootin and Qt4 on my Xfce laptop. Unetbootin worked perfectly on the first time, giving me a bootable 4GB USB stick loaded with Ubuntu 10.04.1.

Installation was simple and straightforward, with minimal user interaction. I did a bit more, because I wanted to create a custom partition layout, but otherwise the end-user part of the process was done in just a minute. The rest of the install proceeded automatically, booting me into a bright, shiny Gnome desktop. After a minute or so a few notifications popped up, advising me to install the proprietary nVidia driver and install some software updates. That took just a couple of clicks. Sweet! Oh, the joys of binary package management.

Even though the HTPC only runs a lowly 1.6ghz dual-core Atom chip with a mere 2GB RAM, the desktop still feels pretty responsive. Firefox starts up as fast or faster than my 1.5ghz Core2Duo laptop with twice the RAM. In my experience, Gnome is always pretty heavy, feeling fairly clunky and somewhat slow even when backed by speedy CPUs and gobs of RAM. Nautilus and Evolution windows always seem to load much slower than Thunar and Claws Mail, or even PCManFM and Thunderbird. Still, despite the anemic hardware, my Gnome experience in Ubuntu is surprisingly pleasant. Nothing seems particularly slow to start on a fresh login. The boot process itself could be faster, though; it takes more'n'a minute to get logged in. I need to cut that down to 10 seconds or less to get a true HTPC "instant-on" experience.

There are some quirks in most ION devices related to suspend and USB wakeups, and my Zotac MAG is no exception. I need to do some commandline hacking to get the computer to suspend every time, and to wake only when a button is pushed on the remote. And configuring LIRC is a whole 'nother deal.

But still, I'm liking what I see. Ubuntu 10.10 will be released next month, and it promises even more improvements and nifty app integration than 10.04.

Laptop

In fact, I liked my brief experience with Ubuntu 10.04 enough to download a daily LiveCD beta of 10.10, "Maverick Meerkat." I plan to create a LiveUSB and install it on my "Linux playground" partition. I've gotten just a taste of how Ubuntu works when using it as a special-purpose media center; now I want to see how it works as a mobile desktop OS.

The last time I had Ubuntu on my laptop, it was Ubuntu Studio 8.04 a few years ago. It lacked polish, lacked the cohesive desktop experience Ubuntu is known for. Its sole advantages were that it was optimized for media production, featuring a low-latency kernel and tons of preinstalled music software, with easy access to much more. I experienced numerous issues with JACK and my USB-to-MIDI adapter, though, so my plans for tinkering with music were shelved.

I'll see how vanilla Ubuntu works on my laptop, and if it goes well, I may look into converting it into a low-latency/realtime audio production environment. I'm very interested in being able to quickly, easily use this machine to create tunes and link up with my piano. There's been a lot of progress in the Linux audio world in two years.

Gentoo

I turned my Gentoo install into a bleeding-edge hardmasked/~arch/stable Xfce testbed. I decided to dump HAL and setup PolicyKit, ConsoleKit, udisks, upower, udev, that whole stack. Originally I wanted just to try out the experimental PackageKit features for Portage, which was a recent GSoC project. I knew that would require most of the aforementioned software stack, so I thought, "As long as it has to be installed anyway, why not dump HAL, too?"

One thing led to another, and pretty soon I had upgraded to Xserver 1.9, disabled HAL, rebuilt world a few times for USE flag changes, and pretty well screwed my system. Reconfiguring my input devices for xorg.conf.d took awhile, and I've been besieged by other difficulties. That's the problem with a source-based distro, and the problem of running Xfce in particular: nothing is especially integrated, and in a lighter environment like Xfce (compared to Gnome), there isn't much code that's designed for a HAL-less system. Apps written for *Kit/udisks/upower are either only available in git, or still unported. A binary distro like Ubuntu would integrate all that stuff forcibly, by writing their own code if need be, so that various actions that require PolicyKit authentication would pop-up windows, prompting for passwords. None of that happens on my Gentoo system.

I ended up adding the Xfce overlay and adding several hardmasked 4.7/live versions of packages just to get udisks/upower support, which helped some. However, several bits of Xfce and other daily applications just don't have the right code yet. So xfdesktop doesn't display icons when drives are plugged, cameras won't always mount and make their photos available, and not all power management options work, even when the user has the right PolicyKit credentials. That's on top of having to reconfigure pretty much my entire working environment due to changes between 4.6 and 4.7 -- everything from panel applets and their configs, to Thunar and window manager preferences. Despite USE="sound" installing libcanberra for event support, and adding sound-theme-freedesktop, event sounds cannot be enabled in the appropriate dialog, because Xfce doesn't believe libcanberra is installed. It's the same for Pidgin -- it doesn't believe there's a working sound framework, either. I have to give it commands like "aplay /some/dir/foo.wav" for each kind of event.

The whole thing, is, quite frankly, a bloody mess. And all because I wanted to get some experimental Portage toys from an overlay. It's my fault, I admit, and even after 8 hours of hacking at it, I'm probably nowhere near finished, assuming it's even possible to get all the bleeding-edge pieces to play nicely together.

(Side note: I would like to thank my fellow Gentoo developer Samuli for taking the time to answer my numerous Xfce-related questions and do a bit of troubleshooting along the way. Thanks, man!)

So, what's to learn from my adventures? First, trying to put everything back to its previous state would take another two days of work, if it's even possible! Second, the price for living on the bleeding edge of *Kit integration is too high. I don't mind running the occasional ~arch package, or using git X11 driver stacks like xf86-video-*, Mesa, and libdrm. But basic hardware abstraction stuff is nothing to be fooling around with. Not being able to use pluggable devices, enable Bluetooth, or properly adjust power management on a laptop is too high a price for being forward-thinking. I wish I'd stayed with my boring HAL system. While deprecated and a pain to configure, at least it worked reliably. More important, maintenance was nonexistent; it was simply a matter of copying several .fdi files into the right directory when I first compiled and installed my desktop. After that, I didn't have to touch a thing.

The HAL-less desktop is supposed to be the future for every Linux distribution out there. I can only hope it is still some ways off, to give upstream coders more time to get their applications in order, so that distributions don't have to do much patching or extensive repackaging and integration, and so that the end-users don't have to spend hours configuring everything to their liking.

If there is something annoying about reviewing PO files is that it is impossible. When there are two hundred messages in a PO file, how are you going to know which messages changed? Well, that's the way it works currently for Transifex but there are very good news, first a review board is already available which is a good step forward but second it is going to get some good kick to make it awesome. But until this happens, I have written two scripts to make such a review.

Bzip2: slow for both compression and decompression although very good compression.

7-Zip: LZMA algorithm, slower than Bzip2 for compression but very good compression.

Xz: LZMA2, evolution of LZMA algorithm.

Preparation

Be skeptic about compression tools and wanna promote the compression tool

Compare quickly old and new compression tools and find interesting results

So much for the spirit, what you really need is to write a script (Bash, Ruby, Perl, anything will do) because you will want to generate the benchmark data automatically. I picked up Ruby as it's nowadays the language of my choice when it comes to any kind of Shell-like scripts. By choosing Ruby I have a large panel of classes to process benchmarking data, for instance I have a Benchmark class (wonderful), I have a CSV class (awfully documented, redundant), and I have a zillion of Gems for any kind of tasks I would need to do (although I always avoid them).

I first focused on retrieving the data I was interested into (memory, cpu time and file size) and saving it in the CSV format. By doing so I am able to produce charts easily with existing applications, and I was thinking maybe it was possible to use GoogleCL to generate charts from the command line with Google Docs but it isn't supported (maybe it will maybe it won't, it's up to gdata-python-client). However there is an actual Google tool to generate charts, it is the Google Chart API that works by providing a URI to get an image. The Google Image Chart Editor website helps you to generate the chart you want in a friendly WYSIWYG mode, after that it is just a matter of computing the data into shape for the URI. But well while focusing on the charts I found the Ruby Gem googlecharts that makes it friendly to pass the data and save the image.

The Ruby script takes a path as argument, with which it creates a tarball inside a tmpfs directory in order to avoid I/O latencies from a hard-drive. Next it runs a number of commands over the tarball from which it collects benchmark data. The benchmark data is then saved inside CSV files that are reusable within spreadsheet applications. The data is also reused to retrieve charts from the Google Chart API and finally it calls the ImageMagick tool ''convert'' to collect the charts inside a single image. The summary displayed on the standard output is also saved inside a text file.

The script is a bit long for being pasted here (more or less 300 lines) so you can download it from my workstation. If the link doesn't work make sure the web browser doesn't encode ~ (f.e. to "%257E"), I've seen this happening with Safari (inside my logs)! If really you are out of luck, it is available on Pastebin.

Benchmarks

The benchmarks are available for three kinds of data. Compressed media files, raw media files (image and sound, remember that the compression is lossless), and text files from an open source project.

Media Files

Does it make sense at all to compress already compressed data. Obviously not! Let's take a look at what happens anyway.

As you see, compression tools with focus on speed don't fail, they still do the job quick while gaining a few hundred kilo bytes. However the other tools simply waste a lot of time for no gain at all.

So always make sure to use a backup application without compression over media files or the CPU will be heating up for nothing.

Raw Media Files

Will it make sense to compress raw data? Not really. Here are the results:

There is some gain in the order of mega bytes now, but the process is still the same and for that reason it is simply unadapted. For media files there are existing formats that will compress the data lossless with a higher ratio and a lot faster.

Let's compare lossless compression of a sound file. The initial WAV source file has a size of 44MB and lasts 4m20s. Compressing this file with xz takes about 90s, this is very long while it reduced the size to 36MB. Now if you choose the format FLAC, which is doing lossless compression for audio, you will have a record. The file is compressed in about 5s to a size of 24MB! The good thing about FLAC is that media players will decode it without any CPU cost.

The same happens with images, but I lack knowledge about photo formats so your mileage may vary. Anyway, except the Windows bitmap format, I'm not able to say that you will find images uncompressed just like you won't find videos uncompressed... TIFF or RAW is the format provided by many reflex cameras, it has lossless compression capabilities and contains many information about image colors and so on, this makes it the perfect format for photographers as the photo itself doesn't contain any modifications. You can also choose the PNG format but only for simple images.

Text Files

We get to the point where we can compare interesting results. Here we are compressing data that is the most commonly distributed over the Internet.

Lzop and Gzip perform fast and have a good ratio. Bzip2 has a better ratio, and both LZMA and LZMA2 algorithms even better. We can use an initial archive of 10MB, 100MB, or 400MB, the charts will always look alike the one above. When choosing a compression format it will either be good compression or speed, but it will definitely never ever be both, you must choose between this two constraints.

Conclusion

I never heard about the LZO format until I wanted to write this blog post. It looks like a good choice for end-devices where CPU cost is crucial. The compression will always be extremely fast, even for giga bytes of data, with a fairly good ratio. While Gzip is the most distributed compression format, it works just like Lzop, by focusing by default on speed with good compression. But it can't beat Lzop in speed, even when compressing in level 1 it will be fairly slower in matter of seconds, although it still beats it in the final size. When compressing with Lzop in level 9, the speed is getting ridiculously slow and the final size doesn't beat Gzip with its default level where Gzip is doing the job faster anyway.

Bzip2 is noise between LZMA and Gzip. It is very distributed as default format nowadays because it beats Gzip in term of compression ratio. It is of course slower for compression, but easily spottable is the decompression time, it is the worst amongst all in all cases.

Both LZMA and LZMA2 perform almost with an identical behavior. They are using dynamic memory allocation, unlike the other formats, where the higher the input data the more the memory is allocated. We can see the evolution of LZMA is using less memory but has on the other hand a higher cost on CPU time. And we can see they have excellent decompression time, although Lzop and Gzip have the best scores but then again there can't be excellent compression ratio and compression time. The difference between the compression ratio of the two formats is in the order of hundred of kilo bytes, well after all it is an evolution and not a revolution.

On a last note, I ran the benchmarks on an Intel Atom N270 that has two cores at 1.6GHz but I made sure to run the compression tools with only one core.

Bzip2: slow for both compression and decompression although very good compression.

7-Zip: LZMA algorithm, slower than Bzip2 for compression but very good compression.

Xz: LZMA2, evolution of LZMA algorithm.

Preparation

Be skeptic about compression tools and wanna promote the compression tool

Compare quickly old and new compression tools and find interesting results

So much for the spirit, what you really need is to write a script (Bash, Ruby, Perl, anything will do) because you will want to generate the benchmark data automatically. I picked up Ruby as it's nowadays the language of my choice when it comes to any kind of Shell-like scripts. By choosing Ruby I have a large panel of classes to process benchmarking data, for instance I have a Benchmark class (wonderful), I have a CSV class (awfully documented, redundant), and I have a zillion of Gems for any kind of tasks I would need to do (although I always avoid them).

I first focused on retrieving the data I was interested into (memory, cpu time and file size) and saving it in the CSV format. By doing so I am able to produce charts easily with existing applications, and I was thinking maybe it was possible to use GoogleCL to generate charts from the command line with Google Docs but it isn't supported (maybe it will maybe it won't, it's up to gdata-python-client). However there is an actual Google tool to generate charts, it is the Google Chart API that works by providing a URI to get an image. The Google Image Chart Editor website helps you to generate the chart you want in a friendly WYSIWYG mode, after that it is just a matter of computing the data into shape for the URI. But well while focusing on the charts I found the Ruby Gem googlecharts that makes it friendly to pass the data and save the image.

The Ruby script takes a path as argument, with which it creates a tarball inside a tmpfs directory in order to avoid I/O latencies from a hard-drive. Next it runs a number of commands over the tarball from which it collects benchmark data. The benchmark data is then saved inside CSV files that are reusable within spreadsheet applications. The data is also reused to retrieve charts from the Google Chart API and finally it calls the ImageMagick tool ''convert'' to collect the charts inside a single image. The summary displayed on the standard output is also saved inside a text file.

The script is a bit long for being pasted here (more or less 300 lines) so you can download it from my workstation. If the link doesn't work make sure the web browser doesn't encode ~ (f.e. to "%257E"), I've seen this happening with Safari (inside my logs)! If really you are out of luck, it is available on Pastebin.

Benchmarks

The benchmarks are available for three kinds of data. Compressed media files, raw media files (image and sound, remember that the compression is lossless), and text files from an open source project.

Media Files

Does it make sense at all to compress already compressed data. Obviously not! Let's take a look at what happens anyway.

As you see, compression tools with focus on speed don't fail, they still do the job quick while gaining a few hundred kilo bytes. However the other tools simply waste a lot of time for no gain at all.

So always make sure to use a backup application without compression over media files or the CPU will be heating up for nothing.

Raw Media Files

Will it make sense to compress raw data? Not really. Here are the results:

There is some gain in the order of mega bytes now, but the process is still the same and for that reason it is simply unadapted. For media files there are existing formats that will compress the data lossless with a higher ratio and a lot faster.

Let's compare lossless compression of a sound file. The initial WAV source file has a size of 44MB and lasts 4m20s. Compressing this file with xz takes about 90s, this is very long while it reduced the size to 36MB. Now if you choose the format FLAC, which is doing lossless compression for audio, you will have a record. The file is compressed in about 5s to a size of 24MB! The good thing about FLAC is that media players will decode it without any CPU cost.

The same happens with images, but I lack knowledge about photo formats so your mileage may vary. Anyway, except the Windows bitmap format, I'm not able to say that you will find images uncompressed just like you won't find videos uncompressed... TIFF or RAW is the format provided by many reflex cameras, it has lossless compression capabilities and contains many information about image colors and so on, this makes it the perfect format for photographers as the photo itself doesn't contain any modifications. You can also choose the PNG format but only for simple images.

Text Files

We get to the point where we can compare interesting results. Here we are compressing data that is the most commonly distributed over the Internet.

Lzop and Gzip perform fast and have a good ratio. Bzip2 has a better ratio, and both LZMA and LZMA2 algorithms even better. We can use an initial archive of 10MB, 100MB, or 400MB, the charts will always look alike the one above. When choosing a compression format it will either be good compression or speed, but it will definitely never ever be both, you must choose between this two constraints.

Conclusion

I never heard about the LZO format until I wanted to write this blog post. It looks like a good choice for end-devices where CPU cost is crucial. The compression will always be extremely fast, even for giga bytes of data, with a fairly good ratio. While Gzip is the most distributed compression format, it works just like Lzop, by focusing by default on speed with good compression. But it can't beat Lzop in speed, even when compressing in level 1 it will be fairly slower in matter of seconds, although it still beats it in the final size. When compressing with Lzop in level 9, the speed is getting ridiculously slow and the final size doesn't beat Gzip with its default level where Gzip is doing the job faster anyway.

Bzip2 is noise between LZMA and Gzip. It is very distributed as default format nowadays because it beats Gzip in term of compression ratio. It is of course slower for compression, but easily spottable is the decompression time, it is the worst amongst all in all cases.

Both LZMA and LZMA2 perform almost with an identical behavior. They are using dynamic memory allocation, unlike the other formats, where the higher the input data the more the memory is allocated. We can see the evolution of LZMA is using less memory but has on the other hand a higher cost on CPU time. And we can see they have excellent decompression time, although Lzop and Gzip have the best scores but then again there can't be excellent compression ratio and compression time. The difference between the compression ratio of the two formats is in the order of hundred of kilo bytes, well after all it is an evolution and not a revolution.

On a last note, I ran the benchmarks on an Intel Atom N270 that has two cores at 1.6GHz but I made sure to run the compression tools with only one core.

I was looking around for desktop search frameworks today, specifically something with a gtk frontend and that required the fewest resources to run.

I discovered Pinot, a dbus-based file index/monitor/search tool. It even comes with a minimal gtk+ interface. I found few reviews on Pinot, and even fewer recent reviews comparing it to other search frameworks like Strigi, Tracker, and Beagle. I also discovered Catfish, a lightweight frontend to several different search services. There's not much out there on integrating Catfish and Pinot, so I forged ahead and wrote my own code, then did some trial-and-error experiments.

All ebuilds are available on my overnight overlay. Instructions for adding the overlay are on the wiki.

Writing the ebuilds

The only ebuild I found for Pinot is sadly out-of-date, and is completely incorrect. Also, it depends on libtextcat, and I never found an ebuild for that.

So, I wrote my own ebuilds for the latest versions of Pinot and libtextcat.

Not content with Pinot's minimal gtk+ interface, I decided to try Catfish, a PyGtk frontend for several different search engines, including Pinot. Catfish is made by the same developer of Midori, a well-respected lightweight WebKit browser. While Catfish's development has been stalled for two years, I figured it was worth a shot, since its user interface is friendlier than Pinot's.

Catfish, like Pinot and libtextcat, is not in Portage, but there is an open bug for its inclusion. However, the ebuild for the latest version needed updating, as it didn't include Strigi or Pinot. So I rewrote it and added descriptive metadata.xml entries for Catfish's and Pinot's USE flags.

There's still a bit of work left on the Catfish ebuild, since there's a QA warning about not leaving precompiled Python object files in /usr/share/catfish. However, the application itself works perfectly. Just need to clean up the install process so that the bytecode doesn't clutter up the filesystem.

Pinot

On first run, Pinot will take a long time to index your files. I pointed it at my user's /home/ directory, which contains 51,000+ files, totaling 9.3GB on a Reiser3 filesystem with tail enabled. That operation took probably half an hour, and that's on a fast SSD! All of Pinot's indexes and databases take up 455MB, bringing my total /home/ usage to about 9.7GB. Pinot typically used about 50% of my CPU while doing so, sometimes dropping down to the 20s and 40s.

However, since Pinot is on a fast SSD, and it's running off a 2.3Ghz dual-core Athlon backed by 4GB RAM, I didn't notice any performance hit while indexing. I'm not running any special kernels or schedulers (like BFS) either; just vanilla-source-2.6.35.4. There was no noticeable lag or slowdown, despite viewing two Thunar windows, working with four terminals, and browsing nine Firefox tabs. My system was only laggy when compiling Pinot and its dependencies.

Once my /home/ was indexed, I searched around. Queries were pretty much instantaneous. There's no easy way to measure the speed of each query, since it's much too fast to time with a stopwatch. That's probably mostly because of the SSD -- as it is, without a desktop indexer/search app, most similar queries take less than a second. Once the initial filesystem index is complete, Pinot drops back to just monitoring directories if you've told it to do so, relying on the inotify feature in the kernel. That drops CPU and memory usage to zero, as near as I can tell. Nice!

Pinot's greatest advantage on my system, at least, is not its speed, but its usefulness for easily finding deeply buried files and folders.

Interestingly, even though Pinot by default is not supposed to index Git, CVS, or SVN repositories, it seems to ignore that setting. Searching for "catfish" turns up a document named catfish tricksand all the ebuilds and git logs that have "catfish" in the title. Apparently Pinot's regex filter isn't very reliable. I probably need to add in another asterisk to disable searching or indexing of any files within a git directory.

Catfish

Catfish mostly works as expected, though it defaults to using "find" rather than "pinot" as its search engine. I haven't yet found a way to set it to use Pinot as the default search provider. Catfish is quick to load, and its layout is fairly intuitive. Sometimes, however, it will just stop working with Pinot, and even though Pinot has indexed my entire home directory, Catfish won't return any search results, though I can get those results by using Pinot's interface. The rest of the time it works great.

Besides offering a friendlier UI for searches, Catfish's real strengths are its useful options, both for presentation and for tying in with my desktop's filemanager. With a couple of commandline switches, Catfish can display thumbnails of various filetypes, use larger icons in search results, use various wrappers for opening and working with files, or even use powerful regex search methods. No, it won't have the awesome preview capabilities of Gloobus, but you also don't have to install all of Gnome to get similar features.

Right out of the box, Catfish will allow you to open files and folders obtained from your search results just by clicking them. I don't know if that works for all filemanagers, but it works with Thunar, which is all I ask.

I like to use Catfish in combination with another powerful feature of Thunar: custom actions. Since Thunar lacks a built-in search bar (aside from a rudimentary go-to alphabetical list when you press a key), how do you integrate a search utility? One way is by adding search functions to the right-click menu.

Open a Thunar window, and go to Edit -> Configure custom actions.

Click the plus icon: +. Give the action a helpful title, description, and icon. "Search" is pretty standard among icon sets, so there should always be one available even when you change themes.

Add the action command: catfish --path=%F

Now go to the Appearance Conditions tab. I left the file pattern as * and checked all boxes, so that no matter where I browse or click, I can launch a Catfish search.

Save the new action and exit Thunar. The next Thunar window you launch will let you right-click anywhere in the browser to open a Catfish search.

You can add any commandline switch you like to the catfish command; just run catfish --help to see the available options.

Thunar's custom action feature is pretty nifty; there are all kinds of things you can put in the context menu. It comes with an example to open a terminal in the current directory. You can create actions to launch applications with a root prompt, convert one image type into another, play media, print or email documents, and more. If you can script it, you can write a trigger for it and stick it in the context menu. Just read the custom actions documentation for many more examples of what you can do with Thunar. Neat!

Looking forward

So, will I keep using Pinot and Catfish? Possibly. While I am leery of any process like Pinot that writes so often to my SSD, and I'm not at all happy with its database size compared to my actual directory size, I do like that it's fast, and responsive. It doesn't seem to have the huge memory leaks or lag that Strigi/Nepomuk do in KDE. In fairness, KDE is trying to get us to believe in the power of the "semantic desktop," while Pinot and Catfish just want to create an easy frontend for finding stuff, without worrying about associating them with various files or activities.

As long as the database doesn't get too much larger, or the indexing/monitoring services use too many resources, I'll keep it around. I've got five+ years of accumulated files in various folders, with more constantly being loaded to and from offline backups. Pinot and Catfish can help with my hard drive spring cleaning, and help me locate stuff that I've just plain forgotten about. The older you get, the less you remember, right?

What I'd really like is a search bar built-in to Thunar, maybe in the upper right corner, backed by Pinot. That'd place everything I need right up front, without having to drill down through right-click menus.

* * *

Speaking of Thunar:

Do you use Thunar? Do you use Dropbox? Xfce developer Mike Massonnet posted a message to the xfce-dev list this morning with a link to a new project: Thunar Dropbox. It integrates the Dropbox service right into your favorite lightweight filemanager. No longer do you have to run Nautilus just to use Dropbox easily. Now you can use it within Thunar.

I was looking around for desktop search frameworks today, specifically something with a gtk frontend and that required the fewest resources to run.

I discovered Pinot, a dbus-based file index/monitor/search tool. It even comes with a minimal gtk+ interface. I found few reviews on Pinot, and even fewer recent reviews comparing it to other search frameworks like Strigi, Tracker, and Beagle. I also discovered Catfish, a lightweight frontend to several different search services. There’s not much out there on integrating Catfish and Pinot, so I forged ahead and wrote my own code, then did some trial-and-error experiments.

All ebuilds are available on my overnight overlay. Instructions for adding the overlay are on the wiki.

Writing the ebuilds

The only ebuild I found for Pinot is sadly out-of-date, and is completely incorrect. Also, it depends on libtextcat, and I never found an ebuild for that.

So, I wrote my own ebuilds for the latest versions of Pinot and libtextcat.

Not content with Pinot’s minimal gtk+ interface, I decided to try Catfish, a PyGtk frontend for several different search engines, including Pinot. Catfish is made by the same developer of Midori, a well-respected lightweight WebKit browser. While Catfish’s development has been stalled for two years, I figured it was worth a shot, since its user interface is friendlier than Pinot’s.

Catfish, like Pinot and libtextcat, is not in Portage, but there is an open bug for its inclusion. However, the ebuild for the latest version needed updating, as it didn’t include Strigi or Pinot. So I rewrote it and added descriptive metadata.xml entries for Catfish’s and Pinot’s USE flags.

There’s still a bit of work left on the Catfish ebuild, since there’s a QA warning about not leaving precompiled Python object files in /usr/share/catfish. However, the application itself works perfectly. Just need to clean up the install process so that the bytecode doesn’t clutter up the filesystem.

Pinot

On first run, Pinot will take a long time to index your files. I pointed it at my user’s /home/ directory, which contains 51,000+ files, totaling 9.3GB on a Reiser3 filesystem with tail enabled. That operation took probably half an hour, and that’s on a fast SSD! All of Pinot’s indexes and databases take up 455MB, bringing my total /home/ usage to about 9.7GB. Pinot typically used about 50% of my CPU while doing so, sometimes dropping down to the 20s and 40s.

However, since Pinot is on a fast SSD, and it’s running off a 2.3Ghz dual-core Athlon backed by 4GB RAM, I didn’t notice any performance hit while indexing. I’m not running any special kernels or schedulers (like BFS) either; just vanilla-source-2.6.35.4. There was no noticeable lag or slowdown, despite viewing two Thunar windows, working with four terminals, and browsing nine Firefox tabs. My system was only laggy when compiling Pinot and its dependencies.

Once my /home/ was indexed, I searched around. Queries were pretty much instantaneous. There’s no easy way to measure the speed of each query, since it’s much too fast to time with a stopwatch. That’s probably mostly because of the SSD — as it is, without a desktop indexer/search app, most similar queries take less than a second. Once the initial filesystem index is complete, Pinot drops back to just monitoring directories if you’ve told it to do so, relying on the inotify feature in the kernel. That drops CPU and memory usage to zero, as near as I can tell. Nice!

Pinot’s greatest advantage on my system, at least, is not its speed, but its usefulness for easily finding deeply buried files and folders.

Interestingly, even though Pinot by default is not supposed to index Git, CVS, or SVN repositories, it seems to ignore that setting. Searching for “catfish” turns up a document named catfish tricksand all the ebuilds and git logs that have “catfish” in the title. Apparently Pinot’s regex filter isn’t very reliable. I probably need to add in another asterisk to disable searching or indexing of any files within a git directory.

Catfish

Catfish mostly works as expected, though it defaults to using “find” rather than “pinot” as its search engine. I haven’t yet found a way to set it to use Pinot as the default search provider. Catfish is quick to load, and its layout is fairly intuitive. Sometimes, however, it will just stop working with Pinot, and even though Pinot has indexed my entire home directory, Catfish won’t return any search results, though I can get those results by using Pinot’s interface. The rest of the time it works great.

Besides offering a friendlier UI for searches, Catfish’s real strengths are its useful options, both for presentation and for tying in with my desktop’s filemanager. With a couple of commandline switches, Catfish can display thumbnails of various filetypes, use larger icons in search results, use various wrappers for opening and working with files, or even use powerful regex search methods. No, it won’t have the awesome preview capabilities of Gloobus, but you also don’t have to install all of Gnome to get similar features.

Right out of the box, Catfish will allow you to open files and folders obtained from your search results just by clicking them. I don’t know if that works for all filemanagers, but it works with Thunar, which is all I ask.

I like to use Catfish in combination with another powerful feature of Thunar: custom actions. Since Thunar lacks a built-in search bar (aside from a rudimentary go-to alphabetical list when you press a key), how do you integrate a search utility? One way is by adding search functions to the right-click menu.

Open a Thunar window, and go to Edit -> Configure custom actions.

Click the plus icon: +. Give the action a helpful title, description, and icon. “Search” is pretty standard among icon sets, so there should always be one available even when you change themes.

Add the action command: catfish --path=%F

Now go to the Appearance Conditions tab. I left the file pattern as * and checked all boxes, so that no matter where I browse or click, I can launch a Catfish search.

Save the new action and exit Thunar. The next Thunar window you launch will let you right-click anywhere in the browser to open a Catfish search.

You can add any commandline switch you like to the catfish command; just run catfish --help to see the available options.

Thunar’s custom action feature is pretty nifty; there are all kinds of things you can put in the context menu. It comes with an example to open a terminal in the current directory. You can create actions to launch applications with a root prompt, convert one image type into another, play media, print or email documents, and more. If you can script it, you can write a trigger for it and stick it in the context menu. Just read the custom actions documentation for many more examples of what you can do with Thunar. Neat!

Looking forward

So, will I keep using Pinot and Catfish? Possibly. While I am leery of any process like Pinot that writes so often to my SSD, and I’m not at all happy with its database size compared to my actual directory size, I do like that it’s fast, and responsive. It doesn’t seem to have the huge memory leaks or lag that Strigi/Nepomuk do in KDE. In fairness, KDE is trying to get us to believe in the power of the “semantic desktop,” while Pinot and Catfish just want to create an easy frontend for finding stuff, without worrying about associating them with various files or activities.

As long as the database doesn’t get too much larger, or the indexing/monitoring services use too many resources, I’ll keep it around. I’ve got five+ years of accumulated files in various folders, with more constantly being loaded to and from offline backups. Pinot and Catfish can help with my hard drive spring cleaning, and help me locate stuff that I’ve just plain forgotten about. The older you get, the less you remember, right?

What I’d really like is a search bar built-in to Thunar, maybe in the upper right corner, backed by Pinot. That’d place everything I need right up front, without having to drill down through right-click menus.

* * *

Speaking of Thunar:

Do you use Thunar? Do you use Dropbox? Xfce developer Mike Massonnet posted a message to the xfce-dev list this morning with a link to a new project: Thunar Dropbox. It integrates the Dropbox service right into your favorite lightweight filemanager. No longer do you have to run Nautilus just to use Dropbox easily. Now you can use it within Thunar.