This loop was the responsible to check for the last bytes; either because the words compare found a difference or we are at the end of the memory block and the last block is not 8 bytes wide. You may remember it easier because of the awful code it generated.

After optimization work that loop looked sensibly different:

And we can double-check that the optimization did create a packed MSIL.

Now let’s look at the different operations involved. In red, blue and orange we have unavoidable instructions, as we need to do the comparison to be able to return the result.

13 instructions to do the work and 14 for the rest. Half of the operations are housekeeping in order to prepare for the next iteration.

The experienced developer would notice that if we would have done this on the JIT output each one of the increments and decrements can be implemented with a single assembler operation using specific addressing mode. However, we shouldn’t underestimate the impact of memory loads and stores.