No XOR'ing pointers?

I'm a big fan of the following trick to swap two values without consuming any extra memory for a transfer buffer:

Code:

a ^= b;
b ^= a;
a ^= b;

I acknowledge that there's little to no efficiency gain in doing it this way, especially in the big scheme of things, but I just find it really neat and so I like to employ it. :P

For the first time, I tried to do this on two pointers. I have an array of pointers and I am sorting those pointers using a very specific scheme. When I try to swap the two pointers, like so (simplified here, this is not my actual code):

Code:

uint8_t * a;
uint8_t * b;
(...)
a ^= b;
b ^= a;
a ^= b;

I get the following error:

Code:

invalid operands to binary ^ (have ‘uint8_t *’ and ‘uint8_t *’)

This is on gcc 4.7.2, but I have a suspicion that the C spec says that ^ (and other binary operators) are only defined for integer data types, not pointers, and I suspect not even floats/doubles (because that would seem particularly nonsensical, unless you're doing a trick like what I'm trying to do). Assuming I'm correct about that (if not, why is gcc giving me lip! :)), my question is why? To me it seems perfectly legitimate to want to perform the XOR operation on other data types, regardless whether the compiler understands exactly what you've got in mind, and it's pretty straightforwardly defined, yes? Look at each pair of corresponding bits and obtain a 1 in the same slot if they don't match, obtain a 0 in the same slot if they do match. In other words, my feeling is that the compiler ought not to care what the data is *representing* if you ask for a bitwise operation. Thoughts? Thanks in advance!

> I'm a big fan of the following trick to swap two values without consuming any extra memory for a transfer buffer:
Please don't apply for any jobs then.

This trick belongs to assembler programmers from 40 years ago.
It has no place in a higher level programming language.

As you've already spotted, it's useless for swapping any other data type apart from unsigned integers.

It also confuses the heck out of any compiler optimiser, to the point where even if your machine has a genuine SWAP instruction, the compiler won't recognise it as a swap and it crunches out 3 instructions instead.

If the obvious swap (one using a temporary and 3 assignments) is inlined, I've seen good optimisers completely eliminate the swap (so zero instructions), whereas your cruft would force the compiler to always emit 3 xor instructions regardless, because it can no longer track the true intent of the code. This too will harm further attempts at optimisation in the following code.

Perhaps you meant this.

Code:

*a ^= *b;
*b ^= *a;
*a ^= *b;

All very well - but I've got an exercise for you.

Put that in a function, and call it with say

Code:

int a = 10;
swap( &a, &a );

and tell me what the new value stored in a is (along with what you expect).
I expect a proper swap function trying to swap a variable with itself to preserve the value.

standards says that u cant do bitwise operation on pointer so u r getting this errors.

Obviously you didn't read the last paragraph of my post to find out what my actual question was. "A" for effort, though.

Originally Posted by salem

Please don't apply for any jobs then.

I'm quite happy at my current job -- and quite appreciated there as well -- so don't worry about me applying for any other ones.

Originally Posted by salem

This trick belongs to assembler programmers from 40 years ago.
It has no place in a higher level programming language.

Why not? If it's because it might take a whopping couple milliseconds more than the other way (again, I really don't know which is more efficient, but more on that later), that's ridiculous. Have you ever written even a slightly complex program or library, and profiled it? I'm pretty sure this method of swapping things would hardly be likely to jump to the top of the histogram of where time is spent. If even one programmer were to look at my code in the future and think, "oh, that's nifty, I never realized you could do it that way", then that moment of fun intrigue in that programmer's life would be worth the few extra milliseconds of runtime to me. :P

Originally Posted by salem

As you've already spotted, it's useless for swapping any other data type apart from unsigned integers.

If by "useless" you mean "not allowed by the compiler [or the standard, for that matter]", then you're mostly correct -- XOR is allowed on signed integers as well, not just unsigned. I've just tried it on my machine to verify. But if you take any two streams of bits -- it doesn't matter what they represent -- and perform this sequence of operations, it will swap them with each other. If you don't realize that / don't believe me, try it by hand on paper and see. So conceptually, no, it's not useless for any data type except unsigned integers -- it's useful for any two sequences of bits that need to exchange places.

Originally Posted by salem

It also confuses the heck out of any compiler optimiser, to the point where even if your machine has a genuine SWAP instruction, the compiler won't recognise it as a swap and it crunches out 3 instructions instead.

I don't know this for myself, but for the sake of argument I will assume that you are correct.

Originally Posted by salem

If the obvious swap (one using a temporary and 3 assignments) is inlined, I've seen good optimisers completely eliminate the swap (so zero instructions)

Can you elaborate on this? Granted, I'm not as knowledgeable about how the machine code works as I'd ideally like to be, so correct me if I'm wrong: it would seem to me that "zero instructions" necessarily means "nothing happens" (i.e. it can't happen by the work of some magical pixies or something). Therefore, if the programmer needed two things to exchange places in order for the program to work right, the compiler has got to be producing machine code that makes it work somehow -- if it's producing zero instructions for the moment at which the programmer wants the swap to occur, then it must be doing something somewhere else to compensate and incorporate the effective logic. Perhaps less than three instructions, but certainly more than zero, no?

Originally Posted by salem

Perhaps you meant this.

Code:

Code:
*a ^= *b;
*b ^= *a;
*a ^= *b;

Nope, I didn't mean that. In my original post I bolded the text "I am sorting an array of pointers" to make sure that point was communicated, but apparently it didn't quite work. I'm sorting pointers here, not their targets -- albeit, when I say "sort" here I do not mean the most obvious interpretation of sorting the pointers by the addresses they contain, but rather by the use of a more involved comparison. That's beside the point, however.

Originally Posted by salem

All very well - but I've got an exercise for you.

Put that in a function, and call it with say

Code:

int a = 10;
swap( &a, &a );

and tell me what the new value stored in a is (along with what you expect).

I expect a proper swap function trying to swap a variable with itself to preserve the value.

Let me think of what I would expect, first. I assume you mean put your code (which, again, is not what I'm doing) in a function and try this call? Let's see, we're passing the address of the same integer twice to the swap-function. Then it XORs the target of A against the target of B. Well, a and b are both pointing to the same integer, so we're XORing a bit stream against itself, which obviously will result in all 0 bits, representative of the number 0. The successive calls will just be XORing 0 against 0, leaving 0. So I expect that your code will leave a 0 in the slot, and would do so no matter what the integer's initial value was. I will test my theory now.

Your trick is all fine and good, but there is a huge flaw in it. It's not easy for another programmer to look at that code and say "Oh, this is swapping two values." For most applications, it's more important to write readable code, than it is to write efficient code.

About the swapping two of the pointers that point to the same address. A robust swap function wouldn't modify the contents at that memory address. So, I would consider it a bug in your logic.

Your trick is all fine and good, but there is a huge flaw in it. It's not easy for another programmer to look at that code and say "Oh, this is swapping two values." For most applications, it's more important to write readable code, than it is to write efficient code.

A fair point, but it isn't hard to add a comment that says "Swap values", is it?

Originally Posted by Cameron0960

About the swapping two of the pointers that point to the same address. A robust swap function wouldn't modify the contents at that memory address. So, I would consider it a bug in your logic.

Who is "your" in this statement? If it's me, please read the posts more carefully and you will see that this is *not* what I'm doing, but something that salem incorrectly inferred I was doing.

I am frankly starting to feel that it's a waste of my time to post here, because you folks don't pay attention to what I'm saying.

A fair point, but it isn't hard to add a comment that says "Swap values", is it?

Your right that it wouldn't be hard to add a comment. However, readable code doesn't need to be commented. If code needs to be commented, then the code should be written in a clearer way where commenting isn't necessary.

Who is "your" in this statement? If it's me, please read the posts more carefully and you will see that this is *not* what I'm doing, but something that salem incorrectly inferred I was doing.

I wasn't attacking you when I said "your". I was just referring to the logic above that you had mentioned. I apologize if I gave you the wrong idea. I wanted to illustrate why writing a swap function in that way would lead to a bug that shouldn't exist in a swap function :)

I am frankly starting to feel that it's a waste of my time to post here, because you folks don't pay attention to what I'm saying.

The second poster was kind of a jerk, but your respond wasn't that nice either. Text is easily misconstrued because there is no tone or body language behind it. I always take the stance that people have the best intentions to avoid misinterpreting text. Truly I think you should give us some slack. We went out of our way to help ya. Nobody here is trying to misguide you. We are simply here to help :cool:

Your right that it wouldn't be hard to add a comment. However, readable code doesn't need to be commented. If code needs to be commented, then the code should be written in a clearer way where commenting isn't necessary.

I've heard this philosophy before, and I respectfully disagree. I agree that readable code is much better than unreadable code. But I also think that commenting (in moderation! i.e., no comments that are stupid and unhelpful, such as some of the kinds of comments that university CS courses force students to use) is much better than not commenting. Put both together, and you end up with a product that has far less chance of confusing the programmer trying to maintain it than if you only rely on one or the other.

Originally Posted by Cameron0960

I wasn't attacking you when I said "your". I was just referring to the logic above that you had mentioned. I apologize if I gave you the wrong idea. I wanted to illustrate why writing a swap function in that way would lead to a bug that shouldn't exist in a swap function.

Okay, thanks for clarifying. I agree, that proposition would be a horrible way to go ^_^

Originally Posted by Cameron0960

I don't know, man. We're just trying to help. The second poster was kind of a jerk. But truly I think you should give us some slack. We went out of our way to help ya. Nobody here is trying to misguide you. We are simply here to help.

Thanks for that commentary as well, and I apologize if I came off as kind of a jerk myself -- please understand: most of my experience on forums of this sort is unpleasant. I very rarely post things on forums anymore -- only *very* occasionally, when I run across something that I think is *really* worth braving the hells of the Internet in a desperate attempt to have a respectful discussion about it with other people interested/knowledgeable on the topic. If you look at my post count, you'll see that I have undergone very little activity on this forum in particular, so for all intents and purposes I am a newbie here. When I make one of my first posts and am met with insulting comments ("Please don't apply for any jobs"), particularly from someone with 4,000 posts and a "reputation power" (whatever the hell that may be) of 1808, my natural internal response is "Oh boy, here we go with a f***ing forum and its 'community' of dipsh1ts again", and I tend to get cranky. That said, it's no excuse for causing collateral damage to other individuals who *aren't* acting that way, so I apologize and I hope there's no hard feelings between you and me. :)

so correct me if I'm wrong: it would seem to me that "zero instructions" necessarily means "nothing happens" (i.e. it can't happen by the work of some magical pixies or something)

No, it's called data flow analysis, which enables the compiler to track the values of all variables through the code. Doing this may enable the compiler to discover optimisations which could result in the apparent swap being optimised away.

If the compiler manages to optimise it out, then you have a zero instruction 'swap' and the code still does what it's supposed to.

Stop worrying about what the machine does, and focus on writing simple code which has the highest chance of success, and the highest likelihood of being maintainable after you've gone (unless obfuscated code is your job security).
Simple code stands the best change of being optimised.

I take it that you do understand that the optimiser can remove instructions and still leave the program performing the intended function.

Stop thinking of the compiler as some dumb abacus that needs nursing through every optimisation you can think of. I mean, take a look at what gcc currently offers.

> Why not? If it's because it might take a whopping couple milliseconds more than the other way
Hey, you're the one who opened the micro-optimisation door, not me.

I presume that since you didn't want to write the obvious swap using a temp variable, and that "I'm a big fan" of this xor trickery, that somehow you think it is in some way better (hence this long thread discussing the merits or otherwise of the whole technique).

then you're mostly correct -- XOR is allowed on signed integers as well, not just unsigned. I've just tried it on my machine to verify.

Actually, I'm absolutely correct and you're misguided.
Let me quote from the standard, and highlight the point about signed types.

Originally Posted by c99

Some operators (the unary operator ̃, and the binary operators <<, >>, &, ˆ, and |,
collectively described as bitwise operators) are required to have operands that have
integer type. These operators return values that depend on the internal representations of
integers, and have implementation-defined and undefined aspects for signed types.

You also need to be wary of "Well I tried it with the foo compiler and it works for me". Local success is no guarantee of correct behaviour.
All very well, until you try it on another implementation and get a surprise.

> I didn't mean that. In my original post I bolded the text "I am sorting an array of pointers"
It doesn't matter what you have an array of, if you're making a swap function, you're going to end up with two pointers to whatever they are.
If you have two pointers pointing at the same object (of whatever type), and say you managed to mung the data type into something you can xor (say by casting), then you're up the same creek as if you had two pointers to integers.http://c-faq.com/cpp/swapmacro.html

So to answer your original "thoughts" comment at the bottom of post 1, my thought is that you should just let it go, write the obvious swap and get on with adding some real functionality to your code.

You should perhaps acquaint yourself with StackOverflow its a completely different paradigm for Q&A than traditional forums, and tends to be more "civil". Rep has genuine meaning as it is awarded by anonymous voting on answers and questions or accepting answers if it is your question. And users with more rep have greater powers to edit and amend posts.

In the end however, it is always better to stick to the technical facts to avoid attracting comment on aything other that the question in hand. Had you not said you were a "big fan" of a technique that most consider an interesting academic exercise at best and sheer folly in general then you might not have attracted the level of ridicule. Even if it were beneficial at all, it would be in the realms of micro-optimisation and hardly warranting any fanaticism of any size. It's down there with Duff's Device among interesting things with little place n modern programming.

Besides the fact that the temporary variable swap is an idiom trivially optimised by a modern compiler, there is the issue of aliasing too.

That said, the technique is of largely academic interest only - when was the last time you were down to your last byte of memory such that you could not afford a temporary variable!?

I would argue that even on a memory constrained system, it's better to write semantically clean code and let the optimizer do its job. I've done a lot of stress testing over the years, and the code most likely to fail under resource pressure, almost invariably, is the most obfuscated and mode dependent trash in the system.

Comments on this post

clifford agrees
: Of course; "memory constrained" is not the same as "down to your last byte", and more productive optimisations are likely to be had elsewhere in any case. "micro-optimisation" is usually pointless. Optimise the design instead.