First, turnstep talks alot about trying to get the number of iterations to equal about 5 seconds of cpu time. I agree with this value, however benchmark will do this tuning for you! Simply provide -5 as the number of iterations and benchmark will iterate the loop for at least 5 cpu seconds. For your final benchmark you may want to bring that number up to -10 or -20, just to make sure.

Also, differences of less than 5% should probably be ignored -- not because they are not happening, but rather because they may disappear on a different computer setup. And more importantly, you probably don't need to trade a 5% speed up for other considerations, such as code readability.

Now, note that with this sort of run, you don't want to look at the CPU seconds used, rather you want to look at the last two numbers, the rate and the iteration count. These will tell you which is faster.