Compression Tools Compared

Use top-performing but little-known lossless data compression tools to increase your storage and bandwidth by up to 400%.

Filters

Filters are tools that can be chained together
at the command line so that the output of one
is piped elegantly into the input of the next.
A common example is:

$ ls | more

Filtering is crucial for speeding up network
transfers. Without it, you have to wait for
all the data to be compressed before
transferring any of it, and you need to wait
for the whole transfer to complete before
starting to decompress. Filters speed up
network transfers by allowing data to be
simultaneously compressed, transferred and
decompressed. This happens with negligible
latency if you're sending enough data.
Filters also eliminate the need for an
intermediate archive of your files.

Check whether the data compression tool that you want
is installed on both computers. If it's not, you
can see where to get it in the on-line Resources
for this article. Remember to replace
a/dir in the following examples
with the real path of the data to back up.

Unless your data already is in one big file, be
smart and consolidate it with a tool such as tar.
Aggregated data has more redundancy to winnow out,
so it's ultimately more compressible.

But be aware that the redundancy that saps your
performance also may make it easier to recover
from corruption. If you're worried about
corruption, you might consider testing for it with
the cksum command or adding a limited amount of
redundancy back into your compressed data with a
tool such as parchive or ras.

lzop often is the fastest tool. It finishes
about three times faster than gzip but still
compresses data almost as much. It finishes about
a hundred times faster than lzma and 7za.
Furthermore, lzop occasionally decompresses data even faster
than simply copying it! Use lzop on the command
line as a filter with the backup tool named tar:

$ tar c a/dir | lzop - > backup.tar.lzo

tar's c option tells it to create one big
archive from the files in a/dir. The | is a shell
command that automatically pipes tar's output into
lzop's input. The - tells lzop to read from its
standard input, and the > is a shell command that
redirects lzop's output to a file named
backup.tar.lzo.

You can restore with:

$ lzop -dc backup.tar.lzo | tar x

The d and c options tell lzop to decompress
and write to standard output, respectively. tar's
x option tells it to extract the original files
from the archive.

Although lzop is impressive, you can get even higher compression
ratios—much higher! Here's how. Combine a little-known data compression
tool named lzma with tar to increase
storage space effectively by 400%. Here's how you would use it to back up:

$ tar c a/dir | lzma -x -s26 > backup.tar.lzma

lzma's -x option tells it to compress more, and
its -s option tells it how big of a dictionary
to use.

You can restore with:

$ cat backup.tar.lzma | lzma -d | tar x

The -d option tells lzma to decompress. You
need patience to increase storage by 400%; lzma
takes about 40 times as long as gzip. In other words, that
one-hour gzip backup might take all day with lzma.

This version of lzma is the hardest compressor to
find. Make sure you get the one that acts as a
filter. See Resources for its two locations.

The data compression tool with the best trade-off
between speed and compression ratio is rzip.
With compression level 0, rzip finishes about 400%
faster than gzip and compacts
data 70% more. rzip accomplishes this feat by
using more working memory. Whereas gzip uses only
32 kilobytes of working memory during compression,
rzip can use up to 900 megabytes, but that's okay
because memory is getting cheaper and cheaper.

Here's the big but: rzip doesn't work as a filter—yet. Unless your
data already is in one file, you temporarily need some extra disk space for
a tar archive. If you want a good project to work on
that would shake up the Linux
world, enhance rzip to work as a filter. Until
then, rzip is a particularly good option for
squeezing a lot of data onto CDs or DVDs, because
it performs well and you can use your hard drive
for the temporary tar file.

Here's how to back up with rzip:

$ tar cf dir.tar a/dir
$ rzip -0 dir.tar

The -0 option says to use compression level 0.
Unless you use rzip's -k option, it
automatically deletes the input file, which in
this case is the tar archive. Make sure you use
-k if you want to keep the original file.

rzipped tar archives can be restored with:

$ rzip -d dir.tar.rz
$ tar xf dir.tar

rzip's default compression level is another top
performer. It can increase your effective disk
space by 375% but in only about a fifth of the
time lzma can take. Using it is almost
exactly the same as the example above; simply omit
compression level -0.

Comment viewing options

Hi. Congratulations for this very useful article... I use a script for backup made by myself which use tar +gzip... switching to tar - lsop backup time takes less than half time, increasing the backup size by about 25%.
An idea to improve speed is to replace tar with a more intelligent tool.
Infact, tar simply "cat all files to stdout" and then gzip or lsop compress this huge stream of data, but some data is already compressed (images, movies, open document files) and don't need to be recompressed!
The idea is to have an archiver (like tar) which compress each file by itself, storing the original file in case of images, movies, archives, already compressed files.
Is there any tool that can do this, and save all priviledges (owner, group, mode) associated to each file like tar does?
Thank you. Paolo

(1) Found a typo:
"On the other hand, if you have a 1GHz network, but only a 100MHz CPU"

1 GHz network? Should maybe be 1 Gbps.

(2) Suggestion:
Multi-Core CPUs are the big thing today, compression tools that could utilise multiple cores can run 2, 4 or soon even 8 times faster on "normal" desktop PCs...not even speaking of the servers...which compression tools can utilise this CPU power?

Multi-Core CPUs are the big thing today, compression tools that could utilise multiple cores can run 2, 4 or soon even 8 times faster on "normal" desktop PCs...not even speaking of the servers...which compression tools can utilise this CPU power?

Believe it or not, but compression is one of those application types where all research takes place on Windows Pc's. The last couple of years there were some major breakthroughs in compression caused by the new PAQ context modeling algorithms. Have a look at this site for some results. Programs like gzip, rzip 7-zip and lzop are tested here too, so it should be easy to compare results.http://www.maximumcompression.com/