Just wanted to let you guys know that sin, cos, tan, ln, log10 are all setup in the VM to use the X86 hardware (when possible). This is in addition to square root and pow. This is both on X86 and AMD64, and there are small speed ups to other platforms (by speeding up the calls to these trig and transcendentals). This pretty much does it for this sorta work by me. Anyone have other suggestions for things that need to be sped up?

Hello. Thanks for those, it is very apreciated !!!I don't know if that is your field, or is a correct answer to your question, but shifts and masking (ints, for channel operations on pixels) showed to be very very slow. In fact, it was slower using an RGBA int than four floats for storing/handling pixel values due to those operations. (i'm doing image filtering, and i -of course- need it to be fast )If you could accelerate that also, i'd be your slave for life.

That's interesting, but i think it's unnecessary. As the input can't be over 255, the result can't be illegal, that is, over 255.

That -1 means the result can be -1!

Quote

True, but that kind of optimisation should belong to the compiler, not the coder.

While it may be practical for a compiler to replace division by a constant float with multiplication by the reciprocal, there are complications in doing the same thing for integers.

Quote

What is your ratio between each?

float about 5, vs int about 8 (seconds in both cases). The original int version takes 20.

dranonymous:My revised int version is faster because muliplication is (usually) significantly faster than division. This is true for both integer and floating point, however I suspect that in the floating point case the division has been automatically replaced by a multiplication by the reciprocal.

Current CPU have lots of hardware devoted to floating point, so that they can do simultaneous additions and multiplications. On the other hand there is usually only one shifter, so the integer version probably makes less effective use of the chip (less scope for operations to be performed in parallel).

Pepe - In the int version you shift the alpha value, but then you never did anything with it. Did I miss where you manipulated the value again?

no. In first versions, the values were even all copied into temporary values, then pushed bacK. That class is an expurged version of an other set where i tested how valuable it was to put pixel treatment in a method of an other class. In that old test, i had to extract all components, and pass them to filtering method, along with image array and poke offset. That was a pretty interesting test, because doing so was faster than simply putting all code in a single loop. (server JIT only..)

Quote

Mark/Pepe - Have you looked at the compiled byte code to see how it differs for those small shifting/masking areas?

I would love to, but we can't have a look at how the JIT compiles bytecode, if that's what you meant.

I realize you can't see how the JIT compiled it down to native assembly, but you could see the bytecode produced in the class files and compare them. It would be interesting to see what was going on in each one.

Byte code (compiled java source) is very basic. No optimisations are done there, in order for the JIT to recognise patterns, thus simplify its work and make it more efficient. Assembly (compiled bytecode) is done by JIT, and us, mortals, don't have access to it. That assembly can be way different than what is in the bytecode.

Step 1: Run MS Visual Studio (Boo Hiss!)Step 2: Set up a new (empty) project.Step 3: Go to debug settings, set exe to be your IE, and program arguments to point to the path of a simple HTML page with an applet on (I only deal with applets but you could do this with Java itself just as easily)Step 4: Run a debugger session, and stop the debugger somewhere.

IF you are in an area called something like WIN32, or NT40.DLL or something, then you are in a system call.If you are in <unknown> or <some hex string> then you are probably in the compiled code.SOMETIMES you can tell more easily, as the compiled code will reside in memory with an address much greater than 0x40000000 (the default base address space for code loaded from an exe file).

It is then possible to track down specific parts of code by adding operations to add set constants to a static volatile variable, and then search the disassembly for the constants. Not saying its easy tho, but it can work if your desperate

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org