Best Linux compression tool: 8 utilities tested

The best old and new tools to compress your files

In the '80s and early '90s, compression was king. As you struggled to connect to a BBS (bulletin board system) with the latest Amiga utilities on, you dreamed of when things would be faster and not having to spend as long decompressing files as they took to download.

Fast forward a few decades and the sheer size of the data files we juggle about is pretty boggling. Many have built in compression of some kind. Bandwidth isn't such an issue any more, and in some ways neither is disk space, but it would still be nice if there was a quick and convenient way of reclaiming a few GB here or there, or not having to wait so long when uploading email attachments.

Compression technologies have moved on in the interim, but perhaps not as much as you may expect, because we're fighting against an exponential curve of just how far things can be compacted. Many data formats are nigh on incompressible, because they've already squeezed the redundancies out.

Nevertheless, there are some tools available that leverage our superfast CPUs and gargantuan memory reserves to try some new tricks. In this test, we're looking at a selection of old and new tools currently available.

Some don't get a review, but are included in our tabulated data, which you'll find both in cut-down form here and a full version of online - gzip is there for comparative purposes, for instance.

Our selection

bzip2 rar 7ziplbzip2xz lrzip PeaZiparj

RAR 4.00 beta

Originally released way back in 1993, the RAR format has gone through quite a few revisions and tweaks in the meantime. The original author, Eugene Roshal, licenced the software to a German software company who now produce the WinRAR variant and command line options for non-Windows platforms.

On the decompression side, RAR supports a lot of formats, including unusual ones, such as ISO files and CAB archives. The format is far more popular on the Windows platform, and is generally used for splitting large files into usable chunks. This makes it popular for posting large files to usenet groups, and the WinRAR utility for Windows is very well-used indeed.

The generation of parity and volume files alongside the chunks makes it easy to correct minor transmission errors and make sure you've got a perfect copy of whatever was sent. On Unix systems though, the native RAR format is pretty much nonexistent.

In performance terms, it does better than expected. While it is slower than most of the tools on test, it does actually manage some reasonable space savings across the different filetypes. Compression algorithms are usually focused on some particular type of data, and it may well be that better space savings would be recorded by testing against the sorts of files usually found on a Windows system.

It wasn't particularly troubled by the practically incompressible image files, and it did reasonably well with large disk images and the generic filesystem selection.

As a proprietary command line tool for Linux, though, its uses are limited, and is probably best saved for occasions when interoperability with Windows platforms is required.

Julian Seward released the original bzip2 in 1997 under a BSD licence. In case you are wondering, there was indeed a bzip before that, but it was withdrawn by the author after possible patent worries loomed menacingly (ah, software patents, don't we all love them?).

Not to worry though, because bzip2 is better than it anyway. Using a combination of different algorithms - such as run-length encoding (RLE), the Burrows-Wheeler transform, and other such cunning trickery - it immediately became noteworthy in Unix circles because of the impressive compression achieved compared to the standard utility of the day, gzip.

Cunningly coded to be almost identical in terms of usage, bzip2 soon became a shoo-in replacement for all types of archiving purposes. Most notably, much source code was shipped using a tar/bzip2 combination instead of the usual tar/gzip combination of the time.

It's somewhat disappointing that in the intervening 14 years or so bzip2 hasn't replaced gzip entirely - changing the habits of Unix users is obviously like trying to steer a particularly fat continental shelf or something.

However, for large volumes of archiving, it seems the trade-off between space savings and compute time isn't always worth it. The figures we generated for Test 3 show that bzip2 running on maximum compression does shave a few per cent off the file size, but at the expense of taking around four times as long.

So if speed is of paramount importance to you, gzip is still a better option… Hang on, before we say that, you should check out the review for lbzip2.

This is an intriguing contender for the modern age. Using POSIX threads, this tool parallelises the compression routines so they can be run in more than one process and later combined. We care about this because lots of machines now have a multi-core processor.

Standard bzip and indeed many of the other tools on test are only capable of running in a single thread. That means if you have a dual-core processor, such as the one we used for testing, only one is being used for the hard work of compressing, while others lie idle. Of course, the other cores can take care of the system overhead, but it is a bit of a waste.

Parallelising the task does include a bit of overhead in terms of processor time, because there has to be a 'dispatcher' component that allocates tasks to the threads and combines their results at the end. Even so, on a dual-core machine you should see a reduction in the time taken by around 40%, depending on the actual task.

This is borne out by our results - with the same settings, the time taken by lbzip is between 35 and 45% faster. The significant thing is that it is by and large the same process, and you should end up with pretty much exactly the same files. In our tests, however, the resultant filesizes were a few bytes off in either direction, which may simply be due to slightly different application of the algorithms.

Importantly, files created with lbzip2 are valid bzip2 archives - the format hasn't changed, so they can be distributed to and uncompressed by those using bzip2. Lbzip2 is available in some repos, and some quarters suggest that it should just be aliased to the standard bzip2 commands - there is no real disadvantage to it even on a single core.

Released in 1999, 7zip (aka 7z or 7za) is a relative newcomer to compression. It was written by Igor Pavlov, who also designed the LZMA algorithm that forms the default compression mode.

The 7zip code also includes other compression methods, such as bzip2, so it can support formats other than the default .7z.

Although it's open source, the main development focus is on the Windows platform, where 7z enjoys a great deal of popularity, and the code comes with a natty front-end. The basic source code has been tweaked by some, while other projects have made use of the LZMA SDK to produce very similar variants. One of these is xz, and others include p7zip. For this test we compiled from the original source code.

Looking at the test results, it's easy to think that 7z isn't making use of the multiple cores on offer. In fact, it is a threaded application, but even so takes slightly longer than the single-threaded bzip2 archiver, and twice as long as lbzip2. We could make some allowances for this code, since it's compiled from the generic source rather than being geared to work on Linux, but it fares better than pxz, the parallelised version of the derivative xz compressor.

One area in which this algorithm does perform well is decompression, as this and the xz utilities consistently perform better than the rest of the pack (apart from gzip, which isn't as compressed to begin with).

7z is certainly a useful tool, and one which may become more worthwhile on faster machines, or in cases where you want the compression to be good, but the decompression to be speedy (such as distributing apps and data).