I see what they're saying in regards to possible architecture differences, but it is important for multimedia applications. And when using the pipeline well you get even greater performance gains, for example alpha blending is much more than 4 times faster when the pipeline is utilized well.

but do you think more than 50% of CPU consuming code in a game uses SMID? Otherwise the factor should shrink..

Anyway, I also would prefer hotspot using SMID instructions wherever possible/useful instead of having an API. There reason are:

I'm lazy: even there is only a speed up of 2x for a dedicated method and I could have achieved 4x with the API, it is likely that I also get an improvement in others places, I would not have used SMID calls explicitly..

Such an API probably changes frequently, which is why it is not a good idea to put it in the core library.*

*game developers usually don't care, because most changes after one is done are bug-fixes, so one can stick to the old one (like DirectX) - that's why IMHO a temporary API com.sun.xxx would be great.

Regarding the "other RFE":

I remember asking about the details on a java.net hosted blog (sorry don't rember which one):

IIRC it was a documentation bug, more precise: one of the jvm developers identified a single/few circumstances where SMID can be used and did an implementation. Afterwards, he/she filled the rfe, to document his/her work. This is a step in the right direction, but yet far away from trying to use SMIDs instruction wherever possible - which is what the title may suggest.

Carmack's mainly talking about array access - instead of using MIDP's image rendering routines, a 3D game on MIDP 2 (without OpenGL, etc) need to draw each pixel one at a time by setting a value in an array, and the copying that array to display memory. There's two slow things here:

1. A null check and a array bounds check every time you set a value in an array. Slow! HotSpot can sometimes eliminate those checks, but the VMs on these mobile devices aren't that great.2. For MIDP 2, first you've got to render to a 24-bit color array, and then call the drawRGB() method, which converts the aarray to whatever color format the device uses - which is usually 16-bit color. Color conversion? Slow!

Compared to the original DOOM, which wrote pixel data directly to VGA memory, the MIDP way of doing things is, indeed, slow.

In the long run (and possibly the next 5-10 years) it would probably become obsolete for most of its uses when processors are more like the cell - with multiple SIMD processors. But for the current situation it is an issue!

For the gap to continue to widen the programming complexity of applications would have to increase to be able to create a process where the time spent in logic corresponds to the increase in processing power of the hardware. I suspect that in the next 10 or so years our programming tasks are likely to involve more processing of data with less programming logic. The hardware is going that way, so maybe the programming approach will follow too.

But for the meantime until that change I do agree with wrapping native libraries, especially since that's just a possibility/idea.

I've read some EA documents, stating how auto-vectorization completely failed performance wise - they invested six months in it, with a group of experts in the field. Even C/C++ compilers fail to get most out of SSE2 instructions. Compilers generating code near 80% of optimal was considered an utopia.

Seeing how long it took for the HotSpot JIT to reach the current level of performance, and how fragile it still is, causing you to hand-tune sourcecode by trial and error to optimize for a specific JIT (to squeeze out the last... 200% , should we call it the SweetSpot JIT ), I'm fairly sure auto-vectorization will never ever be able to compete with hand-written instructions.

feature x will probebly never be able to compete with hand-written instructions, where have I heared that before?

It still doesn't Of course the amount of places where that matters is shrinking due to complexity. The time critical tasks of today are no more complex than they were 20 years ago, so the speed gain benefits offered from lower level optimisations drops every couple of years. That is why I talk of the long term, I mean I've spent some time coding for the GBA and you don't even want to touch garbage collection, generalization or dynamic type casting because it shatters your speed. One assembler written in assembler compiles much quicker than anything written in C, and his search speeds trample on everything the commercial (C) world has to offer - not because he optimized heavily but because he was able to make better decisions.

Also the size of programs has grown too large for a complete assembler solution to be viable; the same can be said for C, and maybe soon C++!

It's like scripting languages, you just wouldn't code your ray caster in one but game logic is fine!

I beleave there was a java.net project that reconised certain patterns(you could also help it along) and optimise the hell out of them, I was gonna fiddle with it, but I can't seem to find the bookmark :/

Hmmm how could that library help? From what I understand it compiles javasource to bytecode? For SIMD or other lowlevel stuff the JVM would need to support it, or you mean by redirecting specific calls to a JNI library?

Quote

I beleave there was a java.net project that reconised certain patterns(you could also help it along) and optimise the hell out of them, I was gonna fiddle with it, but I can't seem to find the bookmark :/

Sounds interesting! Maybe you can remember a name or other reference? Would like to try it out too!

Hmmm how could that library help? From what I understand it compiles javasource to bytecode? For SIMD or other lowlevel stuff the JVM would need to support it, or you mean by redirecting specific calls to a JNI library?

yes redirecting can be done, if could one usees certain instructions/library calls.

the idea is if the method uses only certain instructions and apis (with vectors4 and matrix4x4), it can an intermediate, pre-compiled language like GLSL, a JNI lib (e.g. jogl) can execute the code instead of the java one. Of course you can build your own simple vector language and with compiler using SIMD and a JN-lib to compile and call the script.

=> Fully cross platform, scales with time.

using instrumentation (and asm) you can easily extract the method's code, transform and replace it ..

instrumentation can be used each time a vm starts or as a compile step.

in the first case, a c++/glsl/.. compiler is needed on the users pc, but one can do additional optimizatiosn as one knows the exact cpu (#cores, instructions set,..)in the latter one, you simply put multiple versions up: e.g. standard, intel-performance-pack, opengl-pack, ..

But these methods you compile to native code, these still need to be called regulary through JNI at some point during execution... Fx when working with vectors this tends to be scattered all through the source... Could you explain why there wouldn't be many JNI calls that way?

using instrumentation (and asm) you can easily extract the method's code, transform and replace it ..

I think this is an unrealistic expectation.

The moment there is 1 method-call or object-field reference within the bytecode, which (return)value can change any time, it's basicly impossible to port it to GLSL and extremely hard to port to native code.

You'd really need a Domain Specific Language to truely transform to vector-math. Doing it on the GPU adds a significant overhead of copying the data back and forth all the time, so even when the transformer is there, you'd have to specificly design your code around it, instead of trying to find patterns in existing bytecode.

Hi, appreciate more people! Σ ♥ = ¾Learn how to award medals... and work your way up the social rankings!

I'm NOT talking about translating a single math API call to JNI, like vector4.add(vecto4 right), but about a methods using such an API to be compiled to a single native method using SIMD. For example a method performing vertex blending: it's likely to iterate over say 2k-50k vectors each transformed by 1-4matrices for a single virtual character - that's when its worth to use SIMD.

Now we agreed that translating custom byte code to a native function using SIMD is very difficult (even for SUN people :-)),. However, things become much easier if the method only contains operations on primitive types, no method calls except to a special math-lib and no or only selected control flow constructs,.. (the more restrictions, the easier it is ;-))

All the above is still next to impossible. Everything should be put into arrays or buffers (no objects anywhere) before this would have a chance. If one is making such an effort, he can as well port it himself. bytecode analysis and transformation won't get you anywhere, because traversing all those (dynamic) object-references with random pointers, is a horrendous datastructure for SIMD related code.

Hi, appreciate more people! Σ ♥ = ¾Learn how to award medals... and work your way up the social rankings!

The moment there is 1 method-call or object-field reference within the bytecode, which (return)value can change any time, it's basicly impossible to port it to GLSL and extremely hard to port to native code.

You'd really need a Domain Specific Language to truely transform to vector-math. Doing it on the GPU adds a significant overhead of copying the data back and forth all the time, so even when the transformer is there, you'd have to specificly design your code around it, instead of trying to find patterns in existing bytecode.

Actually, I'd prefer a DSL for vecmath. In principle that's also what it is, but embedded in Java. This has the advantage that the programmer doesn't need to call things like 'execute(VecMathProg). On the other hand Java doesn't allow to embed it nicely: no operator overloading, implicit casts).

However, it would be easier to translate a DSL created upon Groovy (or Scala) to a native function, than a working at the bytecode level.

All the above is still next to impossible. Everything should be put into arrays or buffers (no objects anywhere) before this would have a chance. If one is making such an effort, he can as well port it himself. bytecode analysis and transformation won't get you anywhere, because traversing all those (dynamic) object-references with random pointers, is a horrendous datastructure for SIMD related code.

I think the usage of classes of type T as well as Arrays of them T[] should be fine, if: - the are declared final - contain only primitives and primitve arrays fields

checking the referred class attributes and method paramters should not be difficult.

Why do you care so much about he says? A game is not built by one person alone and Id is bigger than him. He never had any sympathy for Java and he will say as bad things as he can against it simply because it is not like "C". Forget it.

What you should worry about is to make Java games faster and not what he says. If what he says uncovered problems in the JVM than you have a big problem. Why didn't you foresee this?

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org