Log in

How to swap two integers in C++

Warning: this post got embarrassingly long. For the short version, read up through the paragraph that starts with "If you use."

This is something that's been bugging me for a while, and I'd like to lay it to rest. If you've done any work in C++, you've probably heard the riddle about how to swap two integers without using a temporary variable. The question is exactly what it sounds like: you have two integers (call them x and y), you want to swap their values, and you may or may not want to use a third, temporary variable while doing it. Here are some ways of swapping integers that I have actually seen in professional code written by professional coders:

Method Name

Code

Method A

{ // Limit the scope of t int t = x; x = y; y = t;}

Method B

x ^= y;y ^= x;x ^= y;

Method C

x ^= y ^= x ^= y;

Method D

x = x + y;y = x - y;x = x - y;

Method E

std::swap(x, y);

Which of these is the fastest way to swap integers? Which way is the fastest method that doesn't require extra memory? When will you want to use a different method? What is the overall best method?

Try to come up with some answers before reading further.

When you've decided on your answers, let's take a closer look. No, on second thought, this got really long. For those of you who don't want to read all the way to the end, let me just get my main point out of the way. After that, we'll take a closer look. If you don't want to spoil the ending, skip the next paragraph.

If you use the xor trick to swap integers, STOP IT! Unless you have a brain-dead compiler, the naive swap with a temporary integer doesn't use any more space than the xor swap, but it is easier to read and often requires less CPU time. On top of this, the xor swap may actually have incorrect behavior if you do it all on one line (a la Method C). You're not being cute or clever by using this trick; you're actually making your code worse. Yes, the xor swap is an interesting mathematical curiosity, but it should never be used in real code. Just write what you mean (i.e. use either Method A or E) and let the compiler handle it, because it can compile things better than you can. and in the (far-fetched) event that it can't do this right and you really do need those extra couple bytes of stack space, either get a better compiler or write it in assembly. but stop propagating the myth that the xor swap is a good coding trick.

Now that I've gotten that out of my system, we can continue. Throughout this post, I'll be using g++ 4 to compile stuff for a 32-bit x86 Linux distribution, because that's what I happen to have on hand at the moment. This post applies to many other compilers for most other architectures and all other OSes, too, but now you have a way of verifying my work. The g++ command line flags I will use are -Wall (to turn on all warnings) and -g (to add in debugging strings so I can more easily see what's going on when I disassemble the binaries). I will also be using objdump -S to look at the disassembled machine code. We'll start out with something simple and straightforward, and compile in the simple and straightforward way:

Well, there's our first snafu—Method C is not actually valid C++! From the fourth paragraph of Section 5 of the C++ standard, "[b]etween the previous and next sequence point a scalar object shall have its stored value modified at most once...; otherwise the behavior is undefined." In case you're not familiar with sequence points, they're defined in Section 1.9 Paragraph 7 as the points where "all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place." In this code, the important sequence points are all the semicolons (there are some others, but they don't have any effect in this example, so I will ignore them). In particular, the one-line xor swap modifies the value of x twice without a sequence point in between, so the behavior isn't well-defined. One way to think about it is that we can't know the value of x after that statement: without any internal sequence points, we can't tell whether the store on the left or the right happens last. In practice, Method C usually does swap the integers correctly, but the standard has no such guarantees. Indeed, the behavior is undefined, which technically means that running this code could very well wipe your hard drive and then start downloading kiddie porn. It's unlikely that anyone will write a compiler which, when faced with this code, creates a program with that behavior, but such a malicious compiler would totally conform to the C++ standard. So Method C is definitely out, and will be deleted from all further analysis.

With that removed, everything else compiles fine and runs fine and has the expected behavior. Let's disassemble the binary we just compiled and take a look at exactly what it's doing:

In case you're not familiar with assembly code, let me give a quick explanation of what's going on here (if you're already comfortable with it, skip this paragraph). The leftmost column is the position of this code within the program; you can ignore it. The bytes that come after it are the actual machine code generated by the compiler; you can ignore them. Then there is a column of the assembly instructions that correspond to the machine code we're ignoring, followed by the arguments to those instructions. These last two columns are the important part, because they're nearly readable and they describe exactly what the computer is doing. Interspersed throughout are the lines of C++ that were compiled to get the machine code just below them. Things that start with % are registers, which are the locations in the processor that store the data you're working with. Parentheses around registers denote that the register contains a pointer to somewhere in memory, and it's talking about that memory location instead of the value in the register itself. The assembly instructions used here can move data from memory to registers and back again (the mov and movl instructions), call other functions (the call instruction), take the exclusive or of two values (the xor instruction), load and add values (lea, which admittedly is confusing because it is used here both to add numbers together and to pass references into std::swap), and subtract values (sub). If you're not very comfortable with this, don't worry; I'll walk you through the important parts.

The compiled instructions are pretty straightforward here. x is stored 8 bytes below the pointer stored in register %ebp, and y is stored 12 (0xc) bytes below that same pointer. To call PrintValues, we push y and then x onto the top of the stack (by loading them into registers %eax and %edx and then putting them near where the stack pointer register, %esp, points) and then we call PrintValues (whose name has been changed slightly but is still recognizable). Note that the arguments are pushed onto the stack in the reverse order that we listed them in the code. Method A loads x into the register %eax, then stores it again 4 bytes before %ebp (presumably this is t). It then loads y and stores it where x goes, etc, just the way you'd expect. For Method B, we first load the variables into registers %edx and %eax, we xor them together so that the result is in %eax, and we store that value back into memory. We then repeat this twice more for the other two xors. The rest of the code should be equally straightforward (maybe not counting the lea instructions, but trust me that it's doing what you'd expect there).

If this were our final compiled program, I would totally agree that the naive swap requires an extra 4 bytes of memory in the stack frame, and therefore truly does require an extra temporary variable that the rest of the methods don't need. However, we compiled with the default optimization level, which has absolutely no optimizations at all whatsoever. If you use g++ and/or GCC, you should really use optimization level 2 for any release-worthy code, and only use level 0 for debugging purposes. So let's try this again with the -O2 flag passed to g++ as well, and see what machine code we get this time:

Whoops! Here, all the variables have been completely optimized away! We just push the values 3 and 5 onto the stack when we want to call PrintValues, and continually swap their order. In this case, all four swapping methods have the same performance: none of them take any CPU cycles at all, and none of them take any memory at all. Clearly, this is not what people have in mind when they ask about swapping integers. So, I need a way to force the values to be stored somewhere, and an easy way to do that is to conditionally modify their values. I'll make some test that the compiler hopefully can't figure out, and set x and y to 0 if the test turns out true. The complier won't know that the result is always false and no variables are modified, so it will need to store their values somewhere just in case we do modify them. To accomplish this, I'll use the following function:

NontrivialFunction is reminiscent of a recursive primality checker, and because it's recursive, the compiler can't inline it (and probably won't try to precompute it). Since x and y may change depending on the result, the compiler will need to store them somewhere so that it's possible to set them to zero if necessary. So now let's compile again (with -O2 for optimization again), and look at the result:

This assembly is a bit harder to read, so let's start at the top: We push 2, then 5, then 3 onto the stack and call NontrivialFunction (remember that arguments on the stack are in reverse order, so this is really calling NontrivialFunction(3, 5, 2)) without yet assigning any of those values to anything you could call x or y. Once that is done, the result is stored in register %al, and we test whether it's zero (it looks like we're comparing it to itself, but it's actually just checking the sign of %al). If it's zero (false), we jump (je) down to the bottom (to line 80487ab). Once there, we store the values 3 and 5 into registers %ebx and %esi (and the value 6, which is the precomputed value of x ^ y, into register %edi; this will be used in Method B). Then we jump (jmp) back up to the first call to PrintValues, and continue on. If we had wanted to go through the if statement (i.e. if the result had been nonzero/true), we wouldn't have jumped and instead we'd have gone through the three xor's and zeroed out the values (a number xor'ed with itself is 0). Don't worry about the pop's and ret at the end; that's just some clean-up done as the main function finishes up, and we can safely ignore it.

So at this point, our trick has worked: x is stored in register %ebx and y is in %esi. but now take a look at the rest of the code: for most of it, the swapping code is removed completely, and we just call PrintValues(x, y) and then PrintValues(y, x) to "swap" them! Most of these swaps take absolutely no CPU time or extra memory. The one exception, however, is Method B, which only got half-optimized away (that first xor is precomputed, but the rest is still there). so in this case, all methods can be completely optimized away by the compiler except for the xor swap, which is slower and unambiguously worse.

"but surely this isn't what real code looks like either," I hear the defenders of xor say. "Surely real code stores the variables in memory instead of directly in registers, since the registers are presumably used for much more than just storing our two piddly little integers." Fair enough; let's take a look! To get the variables stored in memory, we'll need something that makes use of these locations, or in other words something that requires references to the variables and something that the compiler doesn't inline and/or optimize away. The primality tester worked before, so let's just adapt that a little:

void NontrivialReferenceFunction(int& x, int& y) {
// This is really just a fancy no-op, but the compiler doesn't know that.
if (NontrivialFunction(x, y, 2)) {
NontrivialReferenceFunction(x, y);
}
}
...
// In main replacing the part about NontrivialFunction that we added in last time
NontrivialReferenceFunction(x, y);

Now we're getting somewhere! This time our variables are stored in memory (4 and 8 bytes before the place that %ebp points to), and this time they really are loaded into registers, swapped by various means, and then stored back into memory, just like we wanted. Like the previous calls to std::swap, the lea command is used to get the address of (reference to) variables to pass to NontrivialReferenceFunction. We also now have an instance of the add command, which does just what you'd expect. Other than that, though, no new commands are used here. This looks like it's exactly what we wanted, so let's examine this in detail.

Method A loads both variables into registers, and then stores them back in the opposite locations. We don't actually allocate any memory at all for the temporary variable, which has been optimized away yet again. In Method B, we load the variables into registers, xor them, xor them again, store that result in memory (as y), xor them a final time, and stick that result back into memory as x. Like Method A, this requires two registers and four mov commands (two for loading and two for storing the data). Unlike Method A, Method B requires three extra xor commands to get the same result. Method D is a lot like Method B, in that it requires the same 4 mov's, but also requires some extra addition and subtraction work that Method A didn't need. On top of that, it uses a third register, %ecx, while the previous two methods only used two registers. Method E is barely recognizeable because it's got all these strange pieces of code and variables with underscores in their names. The compiler has inlined std::swap, which turns out to be a templated function. Its two arguments have type _Tp (for our purposes, _Tp is int), and are called __a and __b. If you read the C++ code, it creates a temporary _Tp called __tmp and basically just does Method A. The assembly does the same 4 mov's as Method A with nothing extra.

So to sum up, Methods A and E were the same, Method B took more time, and Method D took more time and an extra register to accomplish the same thing. In other words, when we got down to it, the naive swap didn't use a temporary variable, and still ran faster than the xor swap. It turns out the same result happens if you make your variables volatile (provided that you remove the call to NontrivialReferenceFunction, because you can't take references of volatile data). For kicks, I tried the same thing on a 64-bit x86 Linux architecture with g++ 4.1, and found that everything was optimized away into the alternating calls to PrintValues(x, y) and PrintValues(y, x), with no actual swaps, even if I called NontrivialReferenceFunction. If I made the variables volatile, I could actually get the values to swap, but the result was the same: Methods A and E were by far the best, Method B took the same amount of space but more time, and Method D took extra space and extra time.

So now that we've beaten this code to death, I have one final tangent: there is an assembly instruction called XCHG (or EXCH, or something like that, depending on what architecture you're on) which swaps the values of two words. Unfortunately, it's generally a bad idea to use it because if you use it on a location in memory, the processor considers it a memory fence and messes up all sorts of pipelining and slows down all the code around it. So the only time you should consider using XCHG is if you want to swap two registers, and in that case, it's probably better just to remap the registers in the rest of the program (like in the assembly example I showed for the NontrivialFunction version), rather than explicitly swapping them.

To bring this full circle, here are my answers to the original questions: Methods A and E are the fastest way to swap two integers. Methods A and E are also the fastest way to do it without using any extra memory (even though you write C++ with a temporary variable, it gets optimized away). I can't think of a realistic situation where I'd want to use a different method, and the only unrealistic situation I can come up with is when your compiler can't optimize anything at all and makes assembler instructions like the original version we examined, and you really, truly need those extra couple bytes of stack space. However, my first course of action there would be to get a better compiler. Only after I've found that I can't get a good compiler and I can't improve the compiler I already have and I can't easily write it in assembly would I consider using Method B. I'm inclined to call Method A the best because unlike Method E it explicitly shows you what it's doing and it doesn't require linking in the entire algorithm library. However, there are definitely arguments for Method E, for instance that someone who knows what they're doing wrote the standard libraries and if there's a better way to do it then the standard library will probably be updated before your code. Another reason to pick Method E is that there is a whole system you can implement around std::swap that optimizes the swapping of larger objects, though I personally have never yet needed that optimization. but regardless, in this day and age there isn't a good excuse for using Methods B or D, despite any claims by GameDev and others to the contrary (and I strongly suspect that there wasn't any good excuse for the xor swap when that article was written in 2001, either). and under no circumstances should anyone ever use Method C because it isn't guaranteed to do what you want.