$\begingroup$1. What are your thoughts? What have you tried? Have you done some experiments to compare their compression ratio? 2. We normally prefer that you ask one question per question. Those are two separate questions. I'd recommend you edit the question to edit out the former and just ask about the latter (the compression ratio); once you get the answer to that, you can post a subsequent question about the other one, if still needed.$\endgroup$
– D.W.♦Dec 4 '15 at 19:48

2 Answers
2

In a gist LZW is about frequency of repetitions and Huffman is about frequency of single byte occurrence.

Take the string 123123123.

(The following is an oversimplification but will make the point) LZW will identify that 123 is repeated three times and essentially create a dictionary of codes for sequences. It will esentially say when I say A I mean 123 here is AAA (or three bytes).

Huffman will detect the frequency of bytes (let's assume the text above is ASCII or UTF-8 (which will make ABC all single byte code points), so A=3, B=3, C=3 and there are no other items, so I can use 1.5 bits (well a 1 and 2 bit combo) to represent all characters. So let's say A=0, B=10, C=11. Huffman will encode the text ABCABCABC as (in bits) 010110101101011 (or 15 bits) or since we are usually limited to bytes 2 bytes.

What if we used Huffman on the LZW result?

Well AAA can be represented with a single bit (Let's choose 0) so 000 (3 bits or 1 byte rounding up).

The unfortunate part here is that Huffman and LZW require some info to decode so it won't quite be as awesome as sending a 0 and saying go decode with Huffman, then LZW, but in essence this combination results in very good compression results with real-world payloads that aren't already compressed (A JPG or ZIP file is unlikely to be compressible with this, but a .docx, xml doc, txt file, etc (the longer and more verbose and repetitive the better) will do. awesome!

If you look at the characteristics of the algorithms and have awareness of the data you will see that order of algorithms and repetition may well make a difference. Think of a doc 1TB in size of all "A"s and apply the above and think what would do better. Frankly a naive SingleByteUsageEncoder would do best as the minimal info here is what char how many times. Huffman would max out at approximately 1/8 (1 byte to one bit) LZW would do muuuuuuuch better. However if you had a doc with ever possible permutation of a sequence of letters then generally Huffman would do better. I think if you think carefully about this you will recognize that LZW is generally more helpful, but combos often achieve best results (assuming your goal is smallest size, not best performance).