Wednesday, April 27, 2011

This program looks more and more like a Pepsi brand each passing day...

In an earlier note, i presented the basics of advanced parsing strategies, a method which allows to grab a few more percentage points of compression ratio in exchange of massive number of CPU cycles. Yes, let's face it, from a pure "compression/speed" perspective, the trade-off is debatable.

However, advanced parsing comes with an excellent property : it keeps the data format intact, meaning it can be decoded normally, by any compression program using the same format. Decoding speed is barely affected, and if it is, it's likely to be positively. In situations where compression time does not matter, but decompression speed does, this trade-off is acceptable.

In a bid to provide a living example of this strategy, i created ZhuffMax, a simple extension to ZhuffHC using the first form of parsing, lazy matching. For this example, i used a recursive level 1 evaluation, meaning that at each byte, i'm also looking for the next byte, select this new one if it is better, and start again the process.
There is definitely a positive impact on compression ratio. At a couple of % points, it's typical of what can be achieved with lazy strategy.

version

threads

Compression Ratio

Speed

Decoding

Zhuff

0.7

-t2

2.584

285 MB/s

550 MB/s

ZhuffHC

0.1

-t2

2.781

55.6 MB/s

618 MB/s

ZhuffMax

0.1

-t2

2.822

35.0 MB/s

640 MB/s

The impact on compression speed, on the other hand, is pretty massive : 60% slower.

Note the positive impact on decompression speed, since we are once again promoting larger matches.

Such trade-off is unlikely to be interesting in a symmetric scenario, but there are situations where asymmetry is the rule. You can imagine for example a communication between a large, powerful server and a small embedded sensor : the superior compression rate will translate into less data exchanged, which means less battery used for the sensor. Another typical situation is the "distribution" process, where the file is compressed once, and will be downloaded and decoded many times afterwards. In such situations, the extra % of compression ratio is welcomed, even if it means more compression power.

ZhuffMax does not yet completely deserve its name, since there are stronger (and slower) parsing strategies available (namely Optimal parsing). But if i do find time to implement new parsing concepts, i will implement them into ZhuffMax. No miracle should be expected : even optimal can bring only a few more % compression, so we are already close to maximum.

Sunday, April 24, 2011

The "great" thing with all those statistics tools provided by Google is that you can know exactly when, where and why people got to consult your website. Call me "big brother"...
In this particular example, the most frequent Google request leading to this website has been "LZ4 source code" for quite a few weeks. So at least it shows a genuine interest for it.

Well, if you have been typing these few words lately, the wait is over : LZ4 is now an Open Source project, hosted on Google Code.

The version proposed is the "Ultra Fast" one, ready for multi-threading purposes. It has been ex-purged of all windows-specific calls, to increase the capability to build LZ4 for any other system, using a standard C compiler.
An example "main.c" code is also provided, featuring simple file i/o interface. Those willing to start from a ready-to-use Visual Studio project will find one in the "download" category.

Well, hope this helps. Now it's a matter of heating your compilers and test suites...