I expected some ideas about using MMX (for example) or whatever of this kind.

Also, when I started the topic, I though that the bottleneck will be the memory access, so reading dword/qword data will speed the processing. This assumption was false and in fact, the memory read speed is not important at all at least in context of the Inflate algorithm.

Notice, that I am searching a high ratio of speed/size not only one of them.

SHLD, MMX and memory access times are all CPU specific optimisations and have nothing to do with algorithms. So the apparent contradiction is still there.

When doing algorithmic analysis usually the asymptotic bounds are considered because in the absence of a specific CPU or system there is probably no other way to compare them.

Implementing algorithms in a CPU instruction set and then comparing the runtime is not a way to determine which algorithm is "faster". It will only give you a way to determine which algorithm is best suited to the system under test within the finite bounds tested.

02 Dec 2014, 10:34

JohnFound

Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria

JohnFound

Whatever... I think I found really fast solution for Huffman tree traversing. Working...

02 Dec 2014, 10:56

JohnFound

Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria

JohnFound

News: I tried to implement the Huffman codes decoding by not traversing the binary tree, but searching in one big lookup table. It was in hope, that this technique will be much faster.

Well yes, it is faster. But not as faster as I expected. In addition the code became too ugly and I decided the price is too high for so small performance gain.

Anyway, optimizing the previous implementation for size, now it is under 1K without any performance loss: deflate.asm

@redsock - sorry, I didn't answered on your previous post earlier. So, about CRC32 - gzip uses CRC32 checksum, simply because its implementation of Inflate misses a lot of checks inside the inner loops. This way on broken stream, it generate broken data and understands it only after the whole stream is processed, even if the invalid data is at the beginning of the stream.

On the other hand, my implementation has full set of integrity checks inside the main loops, so it detects the invalid data very early in the processing. This way, I simply don't need CRC32 as a way to detect errors in the stream.

Neither does a hash function. But they both do a very good job of it. I wonder how the false negative rates of CRC32 compares to just checking for compressed data consistency, and thus how much trouble you are potentially exposing the code to? I can't find any details about this so perhaps no one has done a study?

06 Dec 2014, 12:47

JohnFound

Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria

JohnFound

It depends on what is considered "error". My implementation does not consider as error the difference between the uncompressed stream and the original data, because it knows nothing about the original data. It consider an errors only when the stream is not valid deflate stream.

If the user of the library knows something about the original data (for example checksum, or even the original file) he can choose whether to compare the result data to this information or not.

06 Dec 2014, 14:05

Matrix

Joined: 04 Sep 2004
Posts: 1171
Location: Overflow

Matrix

JohnFound wrote:

News: I tried to implement the Huffman codes decoding by not traversing the binary tree, but searching in one big lookup table. It was in hope, that this technique will be much faster.

Well yes, it is faster. But not as faster as I expected. In addition the code became too ugly and I decided the price is too high for so small performance gain.

Anyway, optimizing the previous implementation for size, now it is under 1K without any performance loss: deflate.asm

@redsock - sorry, I didn't answered on your previous post earlier. So, about CRC32 - gzip uses CRC32 checksum, simply because its implementation of Inflate misses a lot of checks inside the inner loops. This way on broken stream, it generate broken data and understands it only after the whole stream is processed, even if the invalid data is at the beginning of the stream.

On the other hand, my implementation has full set of integrity checks inside the main loops, so it detects the invalid data very early in the processing. This way, I simply don't need CRC32 as a way to detect errors in the stream.

Ugly looks is never a big price for performance. Consider pretty macros.

crc32 serves data integrity purposes, we may replace it with hsa1sum of a whole uncompresed block.

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum