Dalibor, ATI CAL compiler wasn't smart enough to use BFI_INT, mainly because this instruction presents only at ISA level while lowest level available to programmer is IL. With hacks it was possible to use BFI_INT which brings another major speed-up for MD5 (~16%).