I have just downloaded 1.5 and started playing with it.I made a small test with the new enhanced loops. So far I thought it was just a syntax improvment. I did not realize it can help improve the performance of loops also. Isn't that cool or did I just wake up after everybody ?

Before 1.5, one would write (some stupid test) like this:

// array is int[]

for (int i=0, n=array.length; i<n; i++) { int p = array; p=p+p+p; }

in 1.5, it becomes:

for (int p : array) { p=p+p+p; }

There is no penalty access to the elements of the array and my first tests show it is actually faster (and yes it is a micro benchmark however it have some audio processing code with tight loops goobling most of the processing time). I wonder if the JVM also can optimize the bound checks since it 'knows' that the whole array is going to be scanned ?

Interesting. I guess in that case it is always guaranteed that the index are never out of bounds... I imagine the byte code that is output is therefore exactly what hotspot expects in order for it to do the bounds check elimination. You should use javap to see how the byte code differs between the two loop formats.

There mere fact that both code versions actually differ is already food for thought IMO (that was not obvious from the documentation I read about 1.5).

@Mark,I do not think so, this syntax is used in lots of places in java source files. At worst it does not optimize anything but I would be surprised if it slows down the test.

@Anders,Did you wait for the warmup to finish as Jeff suggested ?

@Jeff, This kind of micro benchmarks can easily go wrong but they are interesting when they mimick the hot spots of your program (small loops called zillions of time to process audio/video data for instance).

Are you saying that the hotspot JVM is 'smart' enough to remove array bound checkings before 1.5 ? (I thought bound checkings could be only removed by a flag) It means it is able to analyze that the index is not modified inside of the loop code, that the increment is contant (maybe it _must_ be 1 in order to remove bound cheks). If so, then it is not obvious there is any gain at all, otherwise, the loops with the new syntax should be faster.

The server JVM in 1.4 is supposed to be able to remove array bounds checks. The client VM does not.

Where did you get this information?

I think the only thing to compare in these cases is the bytecode that is generated. If the bytecode from the new method is larger that is bad because it will be less likely to be inlined. I don't see the relevance in the number of local variables, as the only thing that seems relevant it that they will fit in registers, but of course that is only an issue AFTER compiling to native code.. so it ultimately could be the same code that hotspot generates, if some locals can be eliminated when the bytecode is compiled.

Java bytecode has special instructions for storing and loading from the 1st 4 local variables. (1byte instruction)That example above uses 5 local variables, so 1 of the locals is being accessed using the less compact, generic instruction. (1byte instruction, 1byte argument [hmm, or is it a word?])

I'm sure it makes absolutely no difference once compiled to native (if it did, I would be very worried ).I was just pointing out that an automatically generated loop is less than optimal, which seems illogical to me.

Well, now - the above was just the largest difference the new for loop could make to execution speed. Smart bytecode versus jitting

When main is extracted to a seperate static method (letting the jit compile it) and run in a loop from main until it stabilizes the following numbers show (a factor of 100 bigger than the initial run - size doesn't matter when it stabilizes even in the 10000 case):

java [1.5beta1]:

Old took 224.790 ms and produced 17832936640New took 218.237 ms and produced 17832936640Old took 227.440 ms and produced 17832936640New took 212.157 ms and produced 17832936640

and so on...

java -server:

Old took 92.369 ms and produced 17832936640New took 95.806 ms and produced 17832936640Old took 97.163 ms and produced 17832936640New took 93.987 ms and produced 17832936640

java -Xint:

Old took 1905.819 ms and produced 17832936640New took 1483.741 ms and produced 17832936640

The new for loop is approx. 5-10% faster (i.e. not much!) when run under client, on server the bound checks are eliminated making the comparision void. Interesting though that the server version is more than 20x faster...

Old took 1905.819 ms and produced 17832936640New took 1483.741 ms and produced 17832936640

The new for loop is approx. 5-10% faster (i.e. not much!) when run under client, on server the bound checks are eliminated making the comparision void.

What do you mean 'making the comparison void' ? It looks to me that there is a significant speed up using the new method on the server VM. The new method is > 20% faster according to the numbers above.

Interesting. But slightly incomplete.. he have to assume so much. That page states that array bounds check elimination was added to the server VM in 1.4 (presumably 1.4.0). But I haven't seen anything that states what optimizations are done in the current client VM. For 1.4.1 through to 1.5 beta 1 (on which these test were run) there could be additional optimizations to the client VM. It would be great to have a definitive answer as to what the precise differences between client VM and server VM are in the later releases.

What do you mean 'making the comparison void' ? It looks to me that there is a significant speed up using the new method on the server VM. The new method is > 20% faster according to the numbers above.

Nah, that's the interpreted results you have there

In the server version the difference is +/- < 5% (i.e. alternating and within error margin)

Those papers don't give definitive answers about what optimizations ARE done in the client VM . For instance they don't even mention that the SSE/SSE2 instructions are only used by the server VM. Thanks for looking though .

...and it's only in the much-unused, horribly-slow-to-start-up server VM

Cas

The one with the incredibly sophisticated install procedure for adding to the public JRE. Can you specify the server JVM with webstart (and have it downloaded if not present)?The startup looks like being even slower relative to the client JVM in Tiger --- the shared class stuff seems only to apply to the client VM.

I don't believe you can explicity specify the server VM in Webstart yet. It's not a great solution anyway - we still rather desperately need the hybrid 2-stage VM instead of separate client and server VMs.

Sharing classes is enabled with the server VM but not initialized by default. Run java -server -Xshare:dump and it'll create the shared classes file for the server VM. Working well here.

Huh? Explain this? To my knowledge the only thing you need to do in the JRE to get the server VM is type -Xserver on the command line.

No install required.

For the Java JRE 1.5 ?

For the Java JRE 1.4 it's different. Just you lucky Apple users have got the server VM by default. We poor Windows users don't see any server VM bundled in the JRE 1.4. At dev time you can copy the JDK's server VM dll folder to the JRE so that sending the following command works: java -server -version

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org