Friday, February 25, 2011

Putting into practice conclusions from an earlier post on multi-threading, I finally created a code branch using multi-threading for LZ4.
This is an opportunity to present version 0.7, which for now limit multi-threading to the "benchmark mode" only.

Results are interesting : speed improvement is very noticeable.
However, using 2 threads does not translate into exactly X2 performance (we are more into the +80-90% range).

There are at least 2 reasons for this (that i have identified).
First, not all segments compress as fast as others. This is expected : compression speed is partly dependent on file type being compressed. On a large file, some segments are easier and faster to compress than others. This is exactly what's happening here : i'm effectively impacted by the "slowest segment" which decides the final speed of algorithm.
As an interesting side-effect, using more threads than cores can sometimes lead to increased performance, which is counter-intuitive, but is simply a proof that by "distributing the load", we avoid having one thread with an "easy part" to compress, then waiting for the second one with a "worse part" to handle.

As a consequence, a more clever way to "segment" input data would help to improve the issue.

Second, there are some limits which cannot be improved, in this case, the bus speed between CPU and main memory. On reaching 2GB/s, there is a kind of "wall speed" which cannot be crossed. This effect is not too large with "only" 2 threads, but on quad-core systems is likely to have an impact, especially for decoding speed. There is nothing which can be done to improve upon this. But hopefully, this is not that much of an issue, since working at "memory bus speed" is quite fast enough.

Anyway, here are the results for the current version :

version

Compression Ratio

Speed

Decoding

LZ4 "Ultra Fast"

0.7

2.062

232 MB/s

805 MB/s

LZ4 "Ultra Fast" - 2 threads

0.7

2.062

430 MB/s

1510 MB/s

Since i only own a 2-cores system, i can't really test for more threads. But the code allows for up to 32 threads. So feel free to test on your own rig.

Tuesday, February 22, 2011

In my earlier release, i skipped the integration of a simple to understand but more difficult to execute feature : early detection of border-cases segments.

This definition regroups both "not compressible segments" and "single character segments". The previous algorithm was correctly detecting these cases, but during the construction of the Huffman tree.

The new algorithm ensures these situations are detected before any operation is done to build the tree. It does so by counting differently, in a new structure created for this purpose. The net result is a speed improvement for files which feature such situations :

Detailed performance assessment :

Huff0 v0.6

Huff0 v0.7

Ratio

Compress

Decoding

Ratio

Compress

Decoding

Not compressible

enwik8.7z

1.000

810 MB/s

1.93 GB/s

1.000

870 MB/s

1.93 GB/s

Hardly compressible

win98-lz4hc-lit

1.024

465 MB/s

600 MB/s

1.024

485 MB/s

600 MB/s

audio1

1.070

285 MB/s

280 MB/s

1.070

285 MB/s

280 MB/s

Distributed

enwik8-lz4hc-lit

1.290

204 MB/s

194 MB/s

1.290

204 MB/s

194 MB/s

Lightly Ordered

enwik8-lz4hc-offh

1.131

197 MB/s

184 MB/s

1.131

197 MB/s

184 MB/s

Ordered

enwik8-lz4hc-ml

2.309

214 MB/s

195 MB/s

2.309

214 MB/s

195 MB/s

Squeezed

office-lz4hc-run

3.152

218 MB/s

202 MB/s

3.152

218 MB/s

202 MB/s

enwik8-lz4hc-run

4.959

245 MB/s

224 MB/s

4.959

245 MB/s

224 MB/s

Ultra compressible

loong

275

785 MB/s

2.93 GB/s

275

860 MB/s

2.93 GB/s

This (mostly) closes the gap with Range0 regarding the detection speed of not compressible segments, which is the main case to consider for real-life situations.

Sunday, February 20, 2011

Applying "sample" learnings to Zhuff, it results in a new version, v0.6, featuring a little compression improvement. The new sampling pattern improves compression, by 1 to 4% depending on file type. However, it also decreases compression speed, by about 5%. Therefore this is a matter of taste. On average, i believe it is a fair trade-off.