On 2011, Dec 27, at 1:50 PM, Edmondo Porcu wrote:
> Dear Scala users,
> is it necessary to turn all the vals into "final vals" to allow JVM to perform optimizations, or is that useless?

I am wondering why you are asking, because I really cant see in what kind of case this kind of hand optimization would be in order.

Personally, I wouldn't bother thinking about this until I actually had a performance issue. In those cases where I do find performance issues, I almost never end up doing this kind of optimization.

To answer your question: If you are assigning a literal string to your val, the final would be useless because the string would be put into a lookup table anyway. If you are creating a val inside of a loop, try moving it outside the loop if you can (due to readability issues, not performance). If you are doing anything else I would expect the JIT compiler to sort it out after a few thousand iterations until I was proven wrong.

YMMV. I haven't actually tested any of this, and I have no idea how your code looks.

rule of thumb: trust the scala compiler and the vm. usually they can apply the obvious optimizations by themselves. especially the (-server)vm was able to surprise me in positive ways.
if there is still a problem: trust the result of a profiler.

if you feel like doing experiments, read some stuff on micro benchmarking and just try which way is faster.

to answer your question:
i know about a multithreading realted optimization the vm can perform on vals, but not on vars. but i do not know of an optimization that applies only to final vals.

> On 2011, Dec 27, at 1:50 PM, Edmondo Porcu wrote:
> > Dear Scala users,
> > is it necessary to turn all the vals into "final vals" to allow JVM to
> perform optimizations, or is that useless?
>
> I am wondering why you are asking, because I really cant see in what kind
> of case this kind of hand optimization would be in order.
>
> Personally, I wouldn't bother thinking about this until I actually had a
> performance issue. In those cases where I do find performance issues, I
> almost never end up doing this kind of optimization.
>
> To answer your question: If you are assigning a literal string to your
> val, the final would be useless because the string would be put into a lookup
> table anyway. If you are creating a val inside of a loop, try moving it
> outside the loop if you can (due to readability issues, not performance).

i would suggest to put everything *into* the loop to make it more readable - this way it is clear that the val is only used inside the loop :). it also avoids errors: if you have to make it a var, it might be possible that you forget to update that vars value after a few code changes and you got yourself a bug.

the server vm moves declarations out of the loop btw (at least some years ago it did, i tested that)

If
> you are doing anything else I would expect the JIT compiler to sort it out
> after a few thousand iterations until I was proven wrong.
>
> YMMV. I haven't actually tested any of this, and I have no idea how your
> code looks.
>
> yours
> Geir
>
>
>

Maybe I've over-indulged on the Christmas spirit, and my head is not
clear on this subject - but what exactly is a "final val" - what does
the final do and why is it a necessary part of the language, even?

Maybe I've over-indulged on the Christmas spirit, and my head is not
clear on this subject - but what exactly is a "final val" - what does
the final do and why is it a necessary part of the language, even?

On 2011, Dec 27, at 2:25 PM, Dennis Haupt wrote:
>> To answer your question: If you are assigning a literal string to your
>> val, the final would be useless because the string would be put into a lookup
>> table anyway. If you are creating a val inside of a loop, try moving it
>> outside the loop if you can (due to readability issues, not performance).
> i would suggest to put everything *into* the loop to make it more readable - this way it is clear that the val is only used inside the loop :). it also avoids errors: if you have to make it a var, it might be possible that you forget to update that vars value after a few code changes and you got yourself a bug.
>
> the server vm moves declarations out of the loop btw (at least some years ago it did, i tested that)

We are not talking about the same thing: I am suggesting doing

val myConst = "xyzzy"
something.foreach{item => ...}

instead of

something.foreach{ item => val myConst = "xyzzy" ... }

The last val will eventually be moved outside the loop by the JIT compiler (or possibly the scala compiler?), but I think it occludes what really goes on in the loop. It has nothing to do with the loop, get it out.

> On 2011, Dec 27, at 2:25 PM, Dennis Haupt wrote:
> >> To answer your question: If you are assigning a literal string to your
> >> val, the final would be useless because the string would be put into a
> lookup
> >> table anyway. If you are creating a val inside of a loop, try moving it
> >> outside the loop if you can (due to readability issues, not
> performance).
> > i would suggest to put everything *into* the loop to make it more
> readable - this way it is clear that the val is only used inside the loop :). it
> also avoids errors: if you have to make it a var, it might be possible that
> you forget to update that vars value after a few code changes and you got
> yourself a bug.
> >
> > the server vm moves declarations out of the loop btw (at least some
> years ago it did, i tested that)
>
> We are not talking about the same thing: I am suggesting doing
>
> val myConst = "xyzzy"
> something.foreach{item => ...}
>
> instead of
>
> something.foreach{ item => val myConst = "xyzzy" ... }
>
> The last val will eventually be moved outside the loop by the JIT compiler
> (or possibly the scala compiler?), but I think it occludes what really
> goes on in the loop. It has nothing to do with the loop, get it out.
>
> yours
> Geir
>

i know about a multithreading realted optimization the vm can perform on vals, but not on vars. but i do not know of an optimization that applies only to final vals.

VM doesn't know nothing about vals&vars. At VM level there are final & non-final fields only.Final fields can be optimized by VM much more aggressively, as they don't need to be re-read from shared memory on every potential monitorenter/volatile_load bytecode instruction (i.e. after every virtual/non-inlined call).

Dear Pavel,thank you for your answer. You are getting to the point.Having a final val makes also the accessor final, right?That would be make way more probable that all the call to the getter will be inlined...right?
Best Regards

i know about a multithreading realted optimization the vm can perform on vals, but not on vars. but i do not know of an optimization that applies only to final vals.

VM doesn't know nothing about vals&vars. At VM level there are final & non-final fields only.Final fields can be optimized by VM much more aggressively, as they don't need to be re-read from shared memory on every potential monitorenter/volatile_load bytecode instruction (i.e. after every virtual/non-inlined call).

Dear Pavel,thank you for your answer. You are getting to the point.Having a final val makes also the accessor final, right?

No. It would be impossible to override val in that case - see Viktor's example.

That would be make way more probable that all the call to the getter will be inlined...right?

Of course. But scalac can make accessors final for anonymous/private classes, closures and other "good enough" cases.You can check if it does this or not by playing with scalac+jad (or some other class file decompiler).

Dear Pavel,thank you for your answer. You are getting to the point.Having a final val makes also the accessor final, right?

No. It would be impossible to override val in that case - see Viktor's example.

That would be make way more probable that all the call to the getter will be inlined...right?

Of course. But scalac can make accessors final for anonymous/private classes, closures and other "good enough" cases.You can check if it does this or not by playing with scalac+jad (or some other class file decompiler).

On Tue, Dec 27, 2011 at 02:20:05PM +0100, Dennis Haupt wrote:
> rule of thumb: trust the scala compiler and the vm. usually they can
> apply the obvious optimizations by themselves. especially the
> (-server)vm was able to surprise me in positive ways.
>
> if there is still a problem: trust the result of a profiler.

I would say that the rule should be "hope that scalac and hotspot will
optimize your code, but don't trust them to." Anyone working with large
arrays of data has independently discovered that for/foreach are much
slower than while-loops (although this will probably change in 2.10,
hurrah!) and there are other similar situations.

Looking at the generated bytecode can often be useful (especially for
finding instances of boxing) but profiling and testing are the only
ways to be sure.

Surely an object is by definition final - since you can't extend it - so all its vals should be considered final as well? So Rex's example shows only that the compiler could be doing even more to optimise this case?

Dear Pavel,thank you for your answer. You are getting to the point.Having a final val makes also the accessor final, right?That would be make way more probable that all the call to the getter will be inlined...right?
Best Regards

i know about a multithreading realted optimization the vm can perform on vals, but not on vars. but i do not know of an optimization that applies only to final vals.

VM doesn't know nothing about vals&vars. At VM level there are final & non-final fields only.Final fields can be optimized by VM much more aggressively, as they don't need to be re-read from shared memory on every potential monitorenter/volatile_load bytecode instruction (i.e. after every virtual/non-inlined call).

On 2011, Dec 27, at 5:06 PM, Rex Kerr wrote:
> My canonical example is the low-quality linear congruential random number generator from the Computer Languages Benchmark Game :
(...)
> is about 30% faster (even though there is apparently no difference whatsoever in the meaning of the code--how can an object's vals not be final?).
>
> So--final val for optimization, yes, good idea, at least for numeric constants.

I disagree.

What you say is only true if you are creating truly immense numbers of random numbers, and only then if the runtime is a problem. Who cares if the call takes thirty or fifty microseconds? 1)

I have worked with all to many developers who spent ages thinking about stuff like this, only to be blindsided by the O(2^n) algorithm they implemented. Most developers are not library developers. They are application developers who assemble library calls into cool stuff.

Prematurely optimizing is a slightly better idea than writing your own random number generator. That still doesn't mean it is a good idea.

yours
Geir

1) Some people do care. In that case, you know why, and you will be able to explain why you have chosen to use Scala as your tool of choice. The advice offered in this thread is general, which is why I think it is wrong.

On 2011, Dec 27, at 5:06 PM, Rex Kerr wrote:
> My canonical example is the low-quality linear congruential random number generator from the Computer Languages Benchmark Game :
(...)
> is about 30% faster (even though there is apparently no difference whatsoever in the meaning of the code--how can an object's vals not be final?).
>
> So--final val for optimization, yes, good idea, at least for numeric constants.

I disagree.

What you say is only true if you are creating truly immense numbers of random numbers, and only then if the runtime is a problem. Who cares if the call takes thirty or fifty microseconds? 1)

I have worked with all to many developers who spent ages thinking about stuff like this, only to be blindsided by the O(2^n) algorithm they implemented.

I have seen the work of all too many developers who say things like that, and then find that they're unable to produce high-performance code even when they need to, because they have resolutely avoided learning anything that might help them, and because they have put so much work into a low-performance strategy that refactoring for performance is an impractical amount of work when they belatedly realize that performance is going to be an issue.

Just because some people don't understand the performance characteristics of the algorithms they are using (and yet pay attention to small performance improvements) is not an argument against knowing how to write high-performance code.

Algorithmic complexity is one of the top things to pay attention to if you're writing performance code. It's extremely important. Instead of saying "write first, then benchmark!", one could advise people to consider whether the code was heavily used, and if there are likely to be many items in a collection, and then either ignore performance considerations or choose an appropriate collection.

Memory usage is also quite important for large applications where garbage collection starts becoming expensive. Primative vs. boxed types make a huge difference. Then multiple dispatch becomes an issue. And finally are optimizations like this one--knowing what the JVM can do for you and what it can't and how to help it out.

So, I reiterate: final val for optimization of numeric constants is a good idea. If you don't need your numeric constants optimized, of course, don't bother. In case you do, now you know what to try (for the time being--hopefully eventually the compiler will get smarter and will do more of these things for you).

On 2011, Dec 27, at 5:41 PM, Rex Kerr wrote:
> Algorithmic complexity is one of the top things to pay attention to if you're writing performance code. It's extremely important. Instead of saying "write first, then benchmark!", one could advise people to consider whether the code was heavily used, and if there are likely to be many items in a collection, and then either ignore performance considerations or choose an appropriate collection.

I agree wholeheartedly. I would be perfectly happy to accept well-reasoned explanations for why a bit of code is going to be called an immense number of times and why that can't be avoided. People who are good at managing resources - doesn't matter if it is money, time, power or memory - usually do that anyway when they create a resource budget.

Just measured it, and there is a speed difference with the server VM as of Sun JVM 1.6.0_26 and 1.7.0-ea-b143. It's actually 50%, not 30%, as I had said. (That is, the final val version is 2x faster.)

JRockit lessens the difference a bit, but everything is slower (as is typical).

A 2x slowdown in numeric code is too much to ignore out of perceived danger for numerically intensive operations. I wish it weren't there; it seems like it really shouldn't be. But that's the way it is for now.

on a java 8 server vm, the final val object version is the fastest.
the non final val object is as fast as a static java equivalent. i
could not get the java version as fast as the scala object.

after inlining everything everywhere - no difference left - i still
got differences :). then i changed the execution order. the first
one that was benchmarked was always faster than its twin. repeating
the benchmark twice then showed equal (but slower!) results than the
first run. i have no idea how this is possible. it also happens on
the java7 vm.

i attached my file as a proof. my output:
executing 100000000 warmup calls of not final object
warmup took 1023
executing 60000000 "real" calls of not final object
execution took 499, which is 1.2024048096192385E8 operations per
second
executing 100000000 warmup calls of final object
warmup took 586
executing 60000000 "real" calls of final object
execution took 344, which is 1.7441860465116277E8 operations per
second //first test of final vals
executing 100000000 warmup calls of not final object
warmup took 1279
executing 60000000 "real" calls of not final object
execution took 745, which is 8.053691275167786E7 operations per
second // second test of non final vals. became slower. huh?
executing 100000000 warmup calls of final object
warmup took 854
executing 60000000 "real" calls of final object
execution took 473, which is 1.2684989429175475E8 operations per
second // second test of final vals. wtf???

so basically, i am once more pretty sure that not trying to do micro
optimizations is a good choice. only do it if you are really forced
to. your investment might be turned upside down on the next
vm/hardware upgrade.

Am 27.12.2011 18:05, schrieb Rex Kerr:

CAP_xLa3eU8kiTP2iu7T0FetBqQ7Ak-oKbDcfUv276MAjMbP0oA [at] mail [dot] gmail [dot] com" type="cite">Just measured it, and there is a speed difference with
the server VM as of Sun JVM 1.6.0_26 and 1.7.0-ea-b143. It's
actually 50%, not 30%, as I had said. (That is, the final val
version is 2x faster.)

JRockit lessens the difference a bit, but everything is slower (as
is typical).

A 2x slowdown in numeric code is too much to ignore out of
perceived danger for numerically intensive operations. I wish it
weren't there; it seems like it really shouldn't be. But that's
the way it is for now.

It seems that HotSpot doesn't bother optimizing access to final static fields, such as converting them into constants.In classfiles generated by javac this optimization has no sense since javac already performs constant folding.

On the other way, constant-returning method is very important optimization case besides final fields.And folding call to such method into constant is a way easier than call to field accessor - inlining is enough here.

So I think, it is work for scalac to fold constants into accessors.Moreover, it can completely remove fields for constant-initialized vals (it's hopeless to lay such optimization upon VM).

Just measured it, and there is a speed difference with the server VM as of Sun JVM 1.6.0_26 and 1.7.0-ea-b143. It's actually 50%, not 30%, as I had said. (That is, the final val version is 2x faster.)

JRockit lessens the difference a bit, but everything is slower (as is typical).

A 2x slowdown in numeric code is too much to ignore out of perceived danger for numerically intensive operations. I wish it weren't there; it seems like it really shouldn't be. But that's the way it is for now.

Based on your other benchmarking code, I'm pretty sure you're measuring multiple inlining through through the "lots" method (or the lack thereof), or something similar, in addition to the actual execution time. For microbenchmarking, I really don't know a good way to do it aside from writing a while loop by hand to get up to at least a few thousand processor cycles. Any convenience function like lots that I use eventually starts running afoul of optimization rules.

Microbenchmarking is not straightforward given the complexity of the JVM (including optimizations). That doesn't mean that micro-optimizations aren't important, just that you need to test carefully, run things multiple times and in multiple orders, and so on.

FWIW, even with the strangeness, your tests _do_ show a ~2x speedup with final val.

--Rex

2011/12/27 HamsterofDeath <h-star [at] gmx [dot] de>

i could not resist:

on a java 8 server vm, the final val object version is the fastest.
the non final val object is as fast as a static java equivalent. i
could not get the java version as fast as the scala object.

after inlining everything everywhere - no difference left - i still
got differences :). then i changed the execution order. the first
one that was benchmarked was always faster than its twin. repeating
the benchmark twice then showed equal (but slower!) results than the
first run. i have no idea how this is possible. it also happens on
the java7 vm.

i attached my file as a proof. my output:
executing 100000000 warmup calls of not final object
warmup took 1023
executing 60000000 "real" calls of not final object
execution took 499, which is 1.2024048096192385E8 operations per
second
executing 100000000 warmup calls of final object
warmup took 586
executing 60000000 "real" calls of final object
execution took 344, which is 1.7441860465116277E8 operations per
second //first test of final vals
executing 100000000 warmup calls of not final object
warmup took 1279
executing 60000000 "real" calls of not final object
execution took 745, which is 8.053691275167786E7 operations per
second // second test of non final vals. became slower. huh?
executing 100000000 warmup calls of final object
warmup took 854
executing 60000000 "real" calls of final object
execution took 473, which is 1.2684989429175475E8 operations per
second // second test of final vals. wtf???

so basically, i am once more pretty sure that not trying to do micro
optimizations is a good choice. only do it if you are really forced
to. your investment might be turned upside down on the next
vm/hardware upgrade.

Am 27.12.2011 18:05, schrieb Rex Kerr:

Just measured it, and there is a speed difference with
the server VM as of Sun JVM 1.6.0_26 and 1.7.0-ea-b143. It's
actually 50%, not 30%, as I had said. (That is, the final val
version is 2x faster.)

JRockit lessens the difference a bit, but everything is slower (as
is typical).

A 2x slowdown in numeric code is too much to ignore out of
perceived danger for numerically intensive operations. I wish it
weren't there; it seems like it really shouldn't be. But that's
the way it is for now.

Here you can see that type annotation on final val turns it from constant no non-constant one.Surprisingly enough (for me) such scalac behavior is not an error - it strictly corresponds to the language specification:"A constant value definition is of the form 'final val x = e' where e is a constant expression. The final modifier must be present and no type annotation may be given." (SLS 4.1)

The above optimization will be also performed for non-final vals in Scala.

I cannot see this. Can you provide an example?

(I suppose, the same should be true for Java)

No. For Java, non-final field is always 'var', not 'val'. So it is incorrect to use its initial value instead of re-reading it from memory.

On 2011-12-28 19:16, Pavel Pavlov wrote:
> Surprisingly enough (for me) such scalac behavior is not an error - it strictly corresponds to the
> language specification:
> "A constant value definition is of the form 'final val x = e' where e is a constant expression. The
> final modifier must be present *and no type annotation may be given*." (SLS 4.1)

That is not intuitive to me, too.

> The above optimization will be also performed for non-final vals in Scala.
>
> I cannot see this. Can you provide an example?

I just compiled my previous example without 'final' modifiers, and the result was exactly the same
(binary identical).

> (I suppose, the same should be true for Java)
>
> No. For Java, non-final field is always 'var', not 'val'. So it is incorrect to use its initial
> value instead of re-reading it from memory.

Of course. But computing the initial value (at compile-time) could still benefit from
folding/inlining of constants.