now you have freed 1 register, which may speed up some time sensitive loops. Unfortunately, this code doesn't break the register dependency, which is another drawback. The same applies to the versions where the XORs are replaced with ADDs and SUBs.

Does anyone know an even better way to exchange a register and memory operand?

_________________MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||

08 Jan 2008, 08:04

Madis731

Joined: 25 Sep 2003
Posts: 2141
Location: Estonia

Madis731

XOR doesn't break dependency??? That's new - I've always thought it did - better get back to the manuals.

EDIT: Oh, I got it XOR breaks it only under the condition that src=dest. And in this case we don't even have a false dependence, but a real one.

This version is better because 6-clock wait on [esi] is finished before the final xor eax,[esi] and this finishes immediately (no additional latency marked with xor r32,m32). You'll get something like this:

I think, XOR with RAM operand will be splitted into more than one microinstructions - reading, XORing and then writing back. So, in my opinion, MOVing is faster. But it will be better, as Madis731 said, to consult with intel's manuals.

08 Jan 2008, 10:14

Madis731

Joined: 25 Sep 2003
Posts: 2141
Location: Estonia

Madis731

Heh, and btw from Agner's:

Agner Fog wrote:

16.3 XCHG (all processors)
The XCHG register,[memory] instruction is dangerous. This instruction always has an
implicit LOCK prefix which prevents it from using the cache. This instruction is therefore very
time consuming, and should always be avoided.
The XCHG instruction with register operands may be useful when optimizing for size as
explained on page 65.

EDIT: If you are running out of registers and you absolutely need XCHG r32,m32 then there are other ways around it. Make your application even more memory-accessing and leave all your registers in a location name i.e. r8 dd ?, r9 dd ? etc.

The good thing is that when eax needs to be exchanged with either one of them then i.e. xchg eax,[r8] is not a good option and a rather fast alternative exists (provided you have at least MMX or even SSE). If you have SSE, prefer it to MMX even if you don't need 128 bits.

The problem with your code is that you assume that you can write to r9, so that you would need additional space beyond r8(the actual variable), which is not always the case.

09 Jan 2008, 05:46

Madis731

Joined: 25 Sep 2003
Posts: 2141
Location: Estonia

Madis731

The very fact that write eax to [r9] means that it can be written to. The only assumption is with MOVDQA, where other 8 bytes are not guaranteed. Its upto the coder (or maybe a macro) to guarantee that r8 & r9 are consecutive.

Its true, thought that alignment is needed because SSE can't read beyond page borders and some other problems, like its a lot of coding and doesn't have much speed benefit. Especially over MOV sequence with a temp register, but using SIMD can do the trick sometimes

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum