I tried to implement a frame scaling in a float, just to see how it would perform. It was funny I disabled it after few tenths of seconds and tried it with double. It was still 2x slower than packed signed bytes in INT, but it was considerably faster than float. I looked at some of my old benchmarks, and wow float arithmetic is slower than double. It doesn't matter for me, doubles ar much more precisse than floats, and are more useful for calculations, but I remmember on something from XITH3D manual. "We are using floats, because they are faster..."

So I wonder does have the float performance some reasons, or it's just leftover in 5.0?

Most CPUs these days support double operation in silico, but not float operations. Therefore, for float operations floats will be converted to doubles internally, then reconverted to floats which takes some time (not much, though).

The only factor in favour of floats these days is the lower memory consumption for applications with large floating point arrays.

Also, with a platform-independent system like Java, I wouldn't bet on any implementation-specific performance paradigms such as 'always use floats' (or 'always use doubles', for that matter, or the 'avoid object creation at all costs' which was popular some while ago). The various VM implementations are always good for bizarre surprises performance-wise, and things that work splendidly on one platform fail abysmally on another.

The main reason for using floats is bus bandwidth when you're blasting things down to a graphics card. For calculations, doubles may well be faster on current VM/CPU architectures. Dunno about the Mac mind you/

Actually Intel's CPUs are using 80 bits. They have floating point registers aliased with MMX registers, and they are using revolver like structure. (This doesn't mean you could use assembly and wow you would have full 80 bits of precission, these lowest bits are just for avoiding bad rounding errors. ) So conversion from and to floats shouldn't be as bad as 120/40, or is it?

This remminds me I should look somewhere on FP numbers in the NVIDIA FX5700. It has some internal support for them, but I forgot its precision and maximum range. Are they in 10 - 6 format, or are they in a some variation on FP32 format.

The results seem clear to me... Even if double's were faster (which at least in Java doesn't seem to be the case. Division was slower and mutiplication was same), floats would be better for 3D graphics anyways because it is half the amount of data to transfer on the bus.

In C/C++ the preference of floats over doubles comes strictly from the bus issues of transfering around doubles. Alot of the point comes from nVidia who is a strong proponent for floats over doubles (or even half-floats over floats). The performance point seems to be even more true in Java.

Sorry but I don't believe your results.Not because you are intentionally trying to give bad results, but because micro benchmarks can't be trusted when you give out little information about them.

I'm going to try my own results later on to see if I can get faster float performance vs double performance on my A64 3K+.

It's not a [edit]bad[/edit] microbenchmark. I did 100 million calculations in each loop before determining the final times using another run of 100 million calcs and ran the test several times and varied the values to calculate and had identical results...but run your own.

I'm seeing some weird floating point benchmark results on my Athlon 1.4 GHz using Java 5 -server. I'm benchmarking a very simple loop using floats, doubles and fixed point. The fixed point loop is as fast as similar C code compiled with GCC or Visual Studio, but the floating point math is running at half the speed of similar C code. That is the same speed as when "Strict" floating point is turned on in Visual Studio. Here's the loop I'm benchmarking:

The same code using doubles is even slower. Running the benchmark in JRockit produces the same results as compiled C code = about twice as fast as Hotspot server. Why can't Hotspot optimize the Java code as well as a C compiler?

100k iterations is very few. The overhead of compliling that into native code (if it even bothers doing so.. enable some profiling) will take up a large percentage of the total time spent in that loop.

point being:It quite possibly already is compiling that loop into native code as fast as a c compiler, but your microbenchmark is broken.

Doubles have higher performance.Replace the variables used in the calculation with 'float' to bench the float performance.

Float performance = 22 seconds.Double performance = 13 seconds.

I ran the bench a few times for consistency.

Try to start your app with -Xcompile and see what happens to your results...You are obviously measuring a difference in hotspot's handling of floats vs. doubles here, not a real performance plus of doubles.

100k iterations is very few. The overhead of compliling that into native code (if it even bothers doing so.. enable some profiling) will take up a large percentage of the total time spent in that loop.

point being:It quite possibly already is compiling that loop into native code as fast as a c compiler, but your microbenchmark is broken.

It's not broken, on the contrary I'm quite certain that my benchmark is correct. The code I posted is the loop I'm benchmarking, it's not the entire benchmark application. I do 10 s warmup and 10 s benchmarking.

My tests seem to indicate that there is a flaw in the Hotspot optimizer.

When I run a simple micro benchmark on a P4, double performance is consistently slightly slower after warm-up.

It could be that this is an Athlon-only issue though (I've seen cases before where some numeric operations run slower than on a P4 because of some P4 specific shortcuts which are impossible on an Athlon), but I have the feeling the results are misleading somehow. I have to test at home (where I have an Athlon too).

It can. As a matter of fact I once converted a little C/ASM fractal program to Java where the java version ran as fast as the ASM version of the program and even faster than the C compiled one. It surprised me almost as much as the author of the original program who wanted to show how much faster C is compared to java.

Note the 2x speed improvement in the float and double benchmarks. I get the same scores for a similar C test compiled with GCC or Visual Studio.

It's a VERY simple loop to optimize (fmul, fadd, jump in the assembler output from GCC), so it's really surprising that Hotspot can't optimize it properly. You're more than welcome to try to tweak the code to make it run fast in Hotspot.

When I run a simple micro benchmark on a P4, double performance is consistently slightly slower after warm-up.

It could be that this is an Athlon-only issue though (I've seen cases before where some numeric operations run slower than on a P4 because of some P4 specific shortcuts which are impossible on an Athlon), but I have the feeling the results are misleading somehow. I have to test at home (where I have an Athlon too).

Could you post the entire benchmark?

The reason double performance on Athlons (and P3s for that matter) is worse than P4s, is because we use SSE2 style registers (which are not available for Athlon/P3). Those double registers greatly speed up performance.

The reason double performance on Athlons (and P3s for that matter) is worse than P4s, is because we use SSE2 style registers (which are not available for Athlon/P3). Those double registers greatly speed up performance.

Need a quick clarification on this if you don't mind...The Athlon64s do have SSE2 support don't they ? My 2.0GHz socket 939 Winchester 3200+ appears to have it for sure. I've been benching a P4 1.6GHz Williamette against the Winchester 3200+ and the P4 doesn't seem to be doing that bad comparitively in particle tracking systems involving lots of double-based number crunching. Thanks

Need a quick clarification on this if you don't mind...The Athlon64s do have SSE2 support don't they ? My 2.0GHz socket 939 Winchester 3200+ appears to have it for sure. I've been benching a P4 1.6GHz Williamette against the Winchester 3200+ and the P4 doesn't seem to be doing that bad comparitively in particle tracking systems involving lots of double-based number crunching. Thanks

Yes the Athlon64s have SSE2, but I'd need more info to say why your not seeing a large performance difference. If I was to guess, I'd say the generated code might be slightly different, and that might be causing the performance anomaly.

OK ! so with the Athlon64s - with SSE2 support - can I take it that the JVM will (indeed) use SSE2-style registers so that the double performance of the Athlon64s will be comparable to P4s ?

In general, is the SSE2 performance of Athlon64 as good as the P4 ? And specifically, when running Java apps with the -server option ?

And for the couple of reasons mentioned earlier 1) some CPUs use extended 80-bit precision and 2) some do all the computations in doubles and reconvert to floats (the IBM RS6000 workstation, IIRC, used to do that), is it worth the trouble to stick to floats for speed benefits if memory size is not a consideration ?

So whatever it's causing this difference, I don't know, but the only thing you can conclude from your benchmark is that JRockit optimizes it better (for whatever that's worth).

I was a bit confused why your code gave such a big speed improvement (10x faster fixed point), then I noticed that you use the 'count' variable both for iteration count and inner loop count. This is a bug, right?

No difference in using doubles or floats.Since people don't use apps with -XCompile, using doubles is faster.

I thought -Xcompile just makes the VM do it's optimizations on the first pass so you don't need to wait for a wind up for the JIT? If so, then it just means your microbenchmark isn't running optimized like a real world app would be.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org