"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> I don't know whether gcc is just producing an inefficient CRC32 compared to
> 2x32 but the results seem very odd.... There must be something else we are
> missing?
I went back and looked at the code, and see that I was misled by
terminology: what we've been calling "2x32" in this thread is not two
independent CRC32 calculations, it is use of 32-bit arithmetic to execute
one CRC64 calculation. The inner loop looks like
while (__len-- > 0)
{
int __tab_index = ((int) (__crc1 >> 24) ^ *__data++) & 0xFF;
__crc1 = crc_table1[__tab_index] ^ ((__crc1 << 8) | (__crc0 >> 24));
__crc0 = crc_table0[__tab_index] ^ (__crc0 << 8);
}
whereas a plain CRC32 looks like
while (__len-- > 0)
{
int __tab_index = ((int) (crc >> 24) ^ *__data++) & 0xFF;
crc = crc_table[__tab_index] ^ (crc << 8);
}
where the crc variables are uint32 in both cases. (The true 64-bit
calculation looks like the latter, except that the crc variable is
uint64, as is the crc_table, and the >> 24 becomes >> 56. The "2x32"
code is an exact emulation of the true 64-bit code, with __crc1 and
__crc0 holding the high and low halves of the 64-bit crc.)
In my tests the second loop is about 10% faster than the first on an
Intel machine, and maybe 20% faster on HPPA. So evidently the bulk of
the cost is in the __tab_index calculation, and not so much in the table
fetches. This is still a bit surprising, but it's not totally silly.
Based on the numbers we've seen so far, one could argue for staying
with the 64-bit CRC, but changing the rule we use for selecting which
implementation code to use: use the true 64-bit code only when
sizeof(unsigned long) == 64, and otherwise use the 2x32 code, even if
there is a 64-bit unsigned long long type available. This essentially
assumes that the unsigned long long type isn't very efficient, which
isn't too unreasonable. This would buy most of the speedup without
giving up anything at all in the error-detection department.
Alternatively, we might say that 64-bit CRC was overkill from day one,
and we'd rather get the additional 10% or 20% or so speedup. I'm kinda
leaning in that direction, but only weakly.
Comments?
regards, tom lane