No idea what XCompile does.In the end floats are converted to doubles.

Quote

I thought -Xcompile just makes the VM do it's optimizations on the first pass so you don't need to wait for a wind up for the JIT? If so, then it just means your microbenchmark isn't running optimized like a real world app would be.

It forces hotspot to compile all methods before using them. That way, you can make sure that you are not measuring compile time instead of execution time or that you aren't suffering from different compilation behaviours for whatever reason.

Edit: For those who are interested in what hotspot does when: Start your app with -XX:+PrintCompilationCombine that with -Xcompile and see what happens...

OK ! so with the Athlon64s - with SSE2 support - can I take it that the JVM will (indeed) use SSE2-style registers so that the double performance of the Athlon64s will be comparable to P4s ?

In general, is the SSE2 performance of Athlon64 as good as the P4 ? And specifically, when running Java apps with the -server option ?

And for the couple of reasons mentioned earlier 1) some CPUs use extended 80-bit precision and 2) some do all the computations in doubles and reconvert to floats (the IBM RS6000 workstation, IIRC, used to do that), is it worth the trouble to stick to floats for speed benefits if memory size is not a consideration ?

First a little intro on how the VM works:

The JVM has two sections of code (basically), platform independent and platform dependent code. The platform independent stuff are things that operate on the bytecodes, the IR, and then the optimizations (parsing, constant folding, loop opts, register allocation, etc).

The platform dependent stuff are basically match rules for instructions. So if the VM requires a Multiply Node (MulNode), the VM matches that to the appropriate rule in the particular architecture. Now this matching part is where the AMD64 hasn't been fully optimized. Its mostly there, but there are parts missing, and things we don't do, etc. So yes the VM uses SSE2 for AMD64 machines, but we might be doing a few things suboptimal.

I've also heard that the Athlons (XP and 64) have slower SSE performance compared to P4s. It may no longer be true in later revisions of the chip, etc. Heck I may have heard incorrectly as well. But anyway, the AMD64 as far as the JVM is concerened is just another chip, most of the optimizations are platform independent.

Oh don't forget the AMD64 VM is a 64bit VM, while the X86 VM is a 32bit VM. Internally that means the 64bit VM has to handle larger pointers, etc. Although the VM gains 8 registers for the AMD64 so overall there is a win in performance.

It forces hotspot to compile all methods before using them. That way, you can make sure that you are not measuring compile time instead of execution time or that you aren't suffering from different compilation behaviours for whatever reason.

Edit: For those who are interested in what hotspot does when: Start your app with -XX:+PrintCompilationCombine that with -Xcompile and see what happens...

The JVM has two sections of code (basically), platform independent and platform dependent code. The platform independent stuff are things that operate on the bytecodes, the IR, and then the optimizations (parsing, constant folding, loop opts, register allocation, etc).

The platform dependent stuff are basically match rules for instructions. So if the VM requires a Multiply Node (MulNode), the VM matches that to the appropriate rule in the particular architecture. Now this matching part is where the AMD64 hasn't been fully optimized. Its mostly there, but there are parts missing, and things we don't do, etc. So yes the VM uses SSE2 for AMD64 machines, but we might be doing a few things suboptimal.

I've also heard that the Athlons (XP and 64) have slower SSE performance compared to P4s. It may no longer be true in later revisions of the chip, etc. Heck I may have heard incorrectly as well. But anyway, the AMD64 as far as the JVM is concerened is just another chip, most of the optimizations are platform independent.

Firstly, it's great to have you around here .

Hmm.. so may be I wasn't way off in suspecting that the AMD64 wasn't giving as good a performance boost as I thought it would going by the gaming benchmarks dished out by the hardware review sites. And yes, I've also heard that AMD's SSE implementation still lags Intel. There is a C based benchmark called ScienceMark (http://www.sciencemark.org), developed by Dr. Wilkens who I understand is currently with AMD, which can be used for testing among other things the SSE and SSE2 performance of the CPU. You may have probably heard of it.

Hopefully you folks could get around to implementing the optimized SSE2 for the AMD64 also. What would it take to do that ? An RFE perhaps. That takes time doesn't it ? And with AMD's Venice core scheduled to be released this or next quarter, there will be interest too in SSE3 type optimizations.

Hopefully you folks could get around to implementing the optimized SSE2 for the AMD64 also. What would it take to do that ? An RFE perhaps. That takes time doesn't it ? And with AMD's Venice core scheduled to be released this or next quarter, there will be interest too in SSE3 type optimizations.

Thanks

A RFE is not needed, we know about this and several people (including myself) are working on this. Granted its not high priority as other work is taking up our time. But we'll get to it. I did some reasearch into SSE3, and those instructions don't seem to be well suited towards the VM. The only instruction that I could think of might be FISTTP, but I haven't fully explored that area yet so I'm not sure what else might come in handy.

I reruned tests and result was 30 s for Floats, 9 s for doubles. Presscott Celeron D underclocked to 1.8 GHz. JRockit showed simillar performance, just 3x as slow. It seems they didn't did a SSE2 optimalizations. I might try some ASM programs, just I should need to know how can I setup FP precision in assembly, never needed to go down from doubles.

Re FP16. NVIDIA wievs FP as a number from -1.0 - 1.0 This isn't necessary too much compatibile with other FP formats. Raster drawing has somewhat limited target.

We're working on moving to VC 2003 from vc6 for mustang. The 'free toolkit' from MS is not usable, as it's missing some important and widely used libraries, not available for free.

I'm continually amazed how many people out there are still using VC++ 6.0. That compiler was initially released circa 10 years ago, and it hasn't been patched in like 5. My guess is that the compiler allowed such incredibly broken behavior that the migration efforts to move up to a relatively conforming compiler are very significant.

It's an example of how Microsoft has really hurt cross platform development. They focused so much on supporting their class libraries and proprietary funky compiler extensions, meanwhile putting no effort into fixing a horribly broken compiler.

I think that they should have been sued in countries like Germany which have very strict laws about advertising and standards conformance. Had they had to put a sticker on their box from the get go which said that their compiler had 152 known compliance issues, I think their market share wouldn't have been so high.

Switching to a new compiler is not a simple task. Just consider the cost alone: we'd need to buy a license for everyone who compiles the code. That's a lot of people in our case.

Another issue is that we already know the bugs in the old compiler, and switching to the new one is always risky since it's very likely that we'll run into new compiler bugs - and believe me, we run into them every time we try even a new compiler revision (ie there's a reason the current jdk requires vc6 sp3, not 4), since jdk is such a huge codebase. So prior to the switch, lots of testing needs to be done - functional, performance, footprint.

And then there's the problem that the new version of the compiler may not even be compatible - as it's the case with vc7 (there's a page from ms with the list of incompatibilities), so we actually have to port our code to the new compiler (which in some jdk areas is harder than in others).

But anyway, we're making the switch this time (which I personally am very happy about, as I finally get to use the new tools instead of the 6 years old v.studio)..

GCC produces extremely poor code for Intel. In fact last I heard Intel's own compiler was significantly better than what MS was offering, but that was in the VC6 days.

I'm not sure how vc7 stacks up to the Intel compiler, but with vc6 I knew some guys that did very performance sensitive image processing applications and all of their release builds were done with the Intel compiler because they got a significant boost.

Please provide some links. GCC 3 optimizes much better than the old GCC.

Quote

In fact last I heard Intel's own compiler was significantly better than what MS was offering, but that was in the VC6 days.

I'm not sure how vc7 stacks up to the Intel compiler, but with vc6 I knew some guys that did very performance sensitive image processing applications and all of their release builds were done with the Intel compiler because they got a significant boost.

From what I've read, Intel's compiler is the best (which makes sense), followed by MS and then GCC 3. I think the difference between MS and GCC is small though.

From what I've read, Intel's compiler is the best (which makes sense), followed by MS and then GCC 3. I think the difference between MS and GCC is small though.

In any case the performance of the C++ compiler used to compile a JVM is probably not as important as it is in other applications --- hopefully most of the time will be spent in code generated by the JVM (which will be the same regardless of the C++ compiler used).

In any case the performance of the C++ compiler used to compile a JVM is probably not as important as it is in other applications --- hopefully most of the time will be spent in code generated by the JVM (which will be the same regardless of the C++ compiler used).

Correct, the JVM spends most of its time in generated code. We've upgraded the Solaris C++ compilers several times over the past couple of years, and we've never seen any substantial improvements in performance for the VM. Oh and the new Sun C++ Compilers for Solaris (SS10) are better than the Intel C++ compilers for the X86

I'm not Swede, and I consider Macro Assembly as more simple and intuitive than C++. BTW they have women in Sweden, just they don't care as much about them as women would like, or so they say.While it wouldn't be too much benefical to write JIT in assembly, possibly it could help by having cleaner code, and smaller executables. It would however help if JIT writers would have heavy experience with optimalizations in assembly. For example in holding critical parts of code in L1 cache, or estimation if SSE2 registers would be faster for one operation than 4 operations in general purpose registers.It certainly wouldn't hurt if someone would play with assembly for a week and create some GUI application, or JNI library in it. At least he wouldn't talk nonsense like that above link "there is no lib.exe, so we can't create DLLs" evrery respectable programmer is using link.exe with a nice library definition file. (It's much cleaner that decorating method names with compiler hints.)It reminds me... How would look compiled version of this code :for(int c1 = 0; c1 < array.length; c1++){ something }

Difference between GCC and MSVC isn't small, however. From a my half year old test GCC was close to JVM, rather down for my purpose (translated no easily optimisable code.), MSVC was by 1/5 faster.

Correct, the JVM spends most of its time in generated code. We've upgraded the Solaris C++ compilers several times over the past couple of years, and we've never seen any substantial improvements in performance for the VM. Oh and the new Sun C++ Compilers for Solaris (SS10) are better than the Intel C++ compilers for the X86

The important bits here would be in things like software blitting loops and that sort of thing, that is native anyway (I assume). But to be honest that is where going to assembler would make a lot of sense. Coding software blits, stretches etc. should be done in vectorized code... hand tuned SSE2 and the like. This is one of my ongoing rants that started when I realized how pathetically slow the JPEG loader is in the JRE. I bet going to a proper JPEG loader (e.g. Intel's old jpeg library, or using their new DSP code that is optimized for the SSE2 instructions) would improve GUI startup time for an application that used JPEG images for button icons - just because the current loader is so extremely slow. (Can you tell I have a need to play motion JPEG in my Java UI ? And no, JMF is too broken to use for that sort of thing.)

I'm not sure how well assembler would help things like the ZIP deflating code, but something as fundamental as loading your resources from a compressed JAR should be optimized.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org