I don't know for sure, but it probably works out to the same machine instructions... so I'd say go for whichever one is more readable.
–
Jon SeigelDec 7 '10 at 4:47

2

@Jon Seigel: And "readable" means that which more clearly expresses the intent of the code. Are you (the OP) multiplying by two and adding one, or are you shifting left and setting the LSB?
–
JasonDec 7 '10 at 4:55

2

You are trying to do a job that the compiler would do. So u better not.^^
–
pinichiDec 7 '10 at 4:57

I find the first version faster to read. The second version takes a bit of thought about what you are trying to achieve. As a result I would always use the first one As it is the fastest to understand.
–
Loki AstariDec 7 '10 at 5:54

In fact, it's true that most cases will use LEA. Yet the code is not the same for the two cases. There are two reasons for that:

addition can overflow and wrap around, while bit operations like << or | cannot

(x + 1) == (x | 1) only is true if !(x & 1) else the addition carries over to the next bit. In general, adding one only results in having the lowest bit set in half of the cases.

While we (and the compiler, probably) know that the second is necessarily applicable, the first is still a possibility. The compiler therefore creates different code, since the "or-version" requires forcing bit zero to 1.

Nice to see someone does actually put speculations and wild assumptions to the test. But you explanation why gcc doesn't optimize the shift version is wrong: Your point 1 is invalid, a x<<1 does wrap in exactly the same way as x+x for every x. Also a recent enough compiler will optimize the shift version to the very same lea instruction.
–
hirschhornsalzAug 19 '12 at 10:31

Since the ISO standard doesn't actually mandate performance requirements, this will depend on the implementation, the compiler flags chosen, the target CPU and quite possibly the phase of the moon.

These sort of optimisations (saving a couple of cycles) almost always pale into insignificance in terms of return on investment, against macro-level optimisations like algorithm selection.

Aim for readability of code first and foremost. If your intent is to shift bits and OR, use the bit-shift version. If your intent is to multiply, use the * version. Only worry about performance once you've established there's an issue.

Any decent compiler will optimise it far better than you can anyway :-)

@Knoblauch have you profiled the performance? Maybe using multiply allows the CPU microcode to use SIMD/SSE2 instructions to do it faster than a bit shift?
–
Martin BeckettDec 7 '10 at 17:05

Not to mention preceding instructions. Many processors can execute multiple operations in parallel, but not multiple of the same type. Therefore, it makes sense to use a real multiply if the previous operation was a bit shift. You can even get the counterintuitive result that a *= 2; b*= 2 uses two different operations, precisely because they're different!
–
MSaltersDec 8 '10 at 13:19

Any but the most brain-dead compiler will see those expressions as equivalent and compile them to the same executable code.

Typically it's not really worth worrying too much about optimizing simple arithmetic expressions like these, since it's the sort of thing compilers are best at optimizing. (Unlike many other cases in which a "smart compiler" could do the the right thing, but an actual compiler falls flat.)

This will work out to the same pair of instructions on PPC, Sparc, and MIPS, by the way: a shift followed by an add. On the ARM it'll cook down to a single fused shift-add instruction, and on x86 it'll probably be a single LEA op.

If the compiler does no optimizations at all, then the second would probably translate to faster assembly instructions. How long each instruction takes is completely architecture-dependent. Most compilers will optimize them to be the same assembly-level instructions.

Actually, you can't say that, in general, the second will be fastest since it's quite possible to have an architecture where adds are ten times the speeds of shifts (unlikely, but my point is that it's platform-dependent). If you're limiting yourself to a specific platform, that may be the case but you should probably make that clear in the answer.
–
paxdiabloDec 7 '10 at 7:27

1

And remember the proverb: Benchmarking without -O3 is like comparing F1 drivers in how fast they can go on skateboards.
–
KosDec 7 '10 at 11:20

Can we be less negative, or at least support your statement by saying "the compiler will treat the two forms equivalently"?
–
Seth JohnsonDec 12 '10 at 0:58

ok, ok, sorry. how about "you should probaby be writing hand crafted assembler if you care about speed in this detail"? no? In general when writing cpp I strive for correctness, simplicity and DONE. If optimized doesn't follow from simplicity then you're just begging the next poor slob to pick up this code to hunt you down and shoot ya...
–
Stephen HazelDec 15 '10 at 20:30

This answer is not helpful because it's a baseless guess, with not even a hint of profiling or disassembly to back it up. It encourages people to "micro-optimize", which, as other answers have stated, is wrong.
–
Seth JohnsonDec 12 '10 at 0:15

This is really a comment, not an answer to the question. Please use "add comment" to leave feedback for the author.
–
bitmaskAug 18 '12 at 13:38

The faster is the first form (the one with the shift right), in fact the shr instruction takes 4 clock cycles to complete in the worst case, while the mul 10 in the best case. However, the best form should be decided by the compiler because it has a complete view of the others (assembly) instructions.