The CMPSB, CMPSW, and CMPSD instructions each compare a memory operand pointed to by ESI to a memory operand pointed to by EDI

Both of the descriptions explain what the instructions do but none of them says how. So I needed to do some experiments in Windbg to find the answer to the question.

The first experiment was not a good one. Initially, I thought I'd put processor breakpoint (aka memory breakpoint) at ESI and another one at EDI. I also thought to execute CMPS and let the debugger to break-in on either of the processor breakpoints. And here it goes why it was a bad idea. The execution of CMPS has to complete for debugger break-in. And, by the time the CMPS completes it hits both of the breakpoints.

The other experiment I came up with is like this. Set both ESI and EDI to point two distinct memory addresses that are unmapped. The assumption is when CMPS is executed it raises an exception when trying to read memory. By looking at the exception record we can tell the address the instruction tries to read from. Given that, we can tell if that value was assigned to ESI, or to EDI, and so we can tell whether byte at ESI or byte at EDI is read first.

Here is how I did the experiment in Windbg.

I opened an executable test file in Windbg. I assembled CMPS to be placed in the memory at EIP.

0:000> a011113be cmpsbcmpsb

I changed ESI to point to invalid memory. And I did the same with EDI.

0:000> resi=515151510:000> redi=d1d1d1d1

Here is the disassembly of CMPS. Also, you know from the highlighted text that both ESI and EDI point to unmapped memory addresses.

As you can see the access violation was occurred due to an attempt to read from d1d1d1d1 that is the value of EDI. Therefore, to answer the question at the beginning of the article, the byte at EDI is read first.

To give more to the fun, you may try how emulators handle this - that is the order of memory reads of CMPS instructions.

UPDATE 28/April/2014 Of course I don't encourage people to rely on undocumented behavior when developing software. Stephen Canon says Intel may optimize the microcode for operands from time to time, and so we don't have a good reason to believe this behavior to be stable.

UPDATE 29/April/2014 Here is a Windows C++ program to test this behavior on your architecture: cmps-probe.cpp