A development blog of what Con Kolivas is doing with code at the moment with the emphasis on linux kernel, MuQSS, BFS and -ck.

Saturday, 4 June 2011

lrzip tarball of all 40 linux-2.6 kernels

With the 2.6 linux kernel now officially finished, I'm providing a tarball of all 40 of the 3 point kernel releases as an lrzip tarball. This is a convenient way of getting all the releases in a relatively low bandwidth form, and previous archives of this nature have had a few downloads so I figured I'd complete the archive for those who want to download it and use it as an ad for lrzip:

About this archive: It was compressed with lrzip version 0.606 on a quad core 3GHz core 2 on a relatively slow external USB2 hard drive with the following options:

lrzip -UL 9 linux-2.6.0-2.6.39.tar
Total time: 00:56:19.88

It would have compressed a lot faster without the -L 9 option, but given this is the "final" archive of 2.6, I figured I'd push it a bit further. Lrzip can compress it even further with zpaq as an option, but it makes decompression much slower so I'd personally find the archive less useful.

11 comments:

Wikipedia, unlike the linux kernel, does not have much in the way of redundant information, and in fact the first 100MB (enwik8) and first 1000MB (enwik9) of wikipedia text is a very common benchmark for compression. Lrzip does very well, but only compared to the "regular" compression algorithms. Compared to dedicated algorithms designed to be used for that purpose specifically, they do better. But then, lrzip is meant as a general purpose compression program on large files, not dedicated for one type of data only.

Ok, Thanks ! Lrzip works with a dictionary as well, right? Would it make sense to use the dictionary from that file as basis for kernel archives? If I understand compression correctly, a large part of the data would be a dictionary. Then would it be possible to make archives based on that dictionary that are smaller to download once you have the "main linux source" dictionary? Or put in another way: How much of those 164 MB is dictionary, and how much each kernel's unique data.

Sorry about that. Thought it posted images but I don't know how (or if) we can so I'll just put the URLs to the images below. Inspired by ck's post, I thought I would give this a shot with the 2009 assembly the complete Human Genome (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/) rather than of the 2.6 series of source for shits and giggles.

The top tier in terms of file size were rar, rzip/lzma, and lzma2. Bzip2 and gzip gave the largest archives. Here gzip produced an archive that was over 112 MB larger than the next closest competitor.

Compression speedhttp://img863.imageshack.us/img863/9987/compress.png

In terms of compression speed, bzip2 was the quickest. The second tier was occupied by rar and gzip. A distant forth was rzip/lzma very distant fifth was lzma2 which took almost 10x longer than the fastest. The label markers in the plot represent encode throughput (MB/s).

Gzip decompressed fastest. A close second tier contained both rzip/lzma and lzma2. Bzip took nearly nearly 5x longer than the fastest and approx. 2x as long as the second tier. Rar's decompression performance of over 6-1/2 min was the longest measured. Again, the label markers in the plot represent throughput (MB/s).

Hey graysky. Those results you just posted, I assume the second one was actually lrztar -z. Note that sequential compression is also faster and smaller than using lrztar as wellie:tar cf chromFa.tar chromFalrzip chromFa.tar

will produce better compression thanlrztar chromFa

lrztar is just there for convenience but is not as efficient at compression of really large files.