If you want to look at even more biased benchmarking look at
http://shootout.alioth.debian.org/ it is fundamentally designed to show
that C is the one true language for writing performance computation.

I rather object to the baseless accusation that the benchmarks game is
"designed to show that C is the one true language for writing performance
computation."
Your accusation is false.
Your accusation is ignorant (literally).

If you want to look at even more biased benchmarking look at
http://shootout.alioth.debian.org/ it is fundamentally designed to show
that C is the one true language for writing performance computation.

I rather object to the baseless accusation that the benchmarks game is
"designed to show that C is the one true language for writing performance
computation."
Your accusation is false.
Your accusation is ignorant (literally).

It also strikes me as something rather random to say. Far as I can tell
the shootout comes with plenty of warnings and qualifications and uses a
variety of tests that don't seem chosen to favor C or generally systems
programming languages.
But I'm sure Russel had something in mind. Russel, would you want to
expand a bit?
Thanks,
Andrei

If you want to look at even more biased benchmarking look at
http://shootout.alioth.debian.org/ it is fundamentally designed to show
that C is the one true language for writing performance computation.

I rather object to the baseless accusation that the benchmarks game is
"designed to show that C is the one true language for writing
performance computation."
Your accusation is false.
Your accusation is ignorant (literally).

It also strikes me as something rather random to say. Far as I can tell
the shootout comes with plenty of warnings and qualifications and uses a
variety of tests that don't seem chosen to favor C or generally systems
programming languages.
But I'm sure Russel had something in mind. Russel, would you want to
expand a bit?
Thanks,
Andrei

Not only is it random and baseless, my own personal experience is that
the shootout actually gives a fairly accurate display of what one can
expect on the areas of speed and memory usage.
Less on the side of code size though, because the programs are still too
small to take advantage of some language features designed for large
scale programs.
And I still pray to see D back in the shootout.

The system as set out is biased though, systematically so. This is not
a problem per se since all the micro-benchmarks are about
computationally intensive activity. Native code versions are therefore
always going to appear better. But then this is fine the Shootout is
about computationally intensive comparison.

This is fine, so no bias so far. It's a speed benchmark, so it's
supposed to measure speed. It says as much. If native code comes usually
in top places, the word is "expected", not "biased".

Actually I am surprised
that Java does so well in this comparison due to its start-up time
issues.

I suppose this is because the run time of the tests is long enough to
bury VM startup time. Alternatively, the benchmark may only measure the
effective execution time.

Part of the "problem" I alluded to was people using the numbers without
thinking. No amount of words on pages affect these people, they take
the numbers as is and make decisions based solely on them.

Well, how is that a bias of the benchmark?

C, C++ and
Fortran win on most of them and so are the only choice of language.

The benchmark measures speed. If one is looking for speed wouldn't the
choice of language be in keeping with these results? I'd be much more
suspicious of the quality and/or good will of the benchmark if other
languages would frequently come to the top.

As I understand it, Isaac ruins this basically single handed, relying of
folk providing versions of the code. This means there is a highly
restricted resource issue in managing the Shootout. Hence a definite
set of problems and a restricted set of languages to make management
feasible. This leads to interesting situation such as D is not part of
the set but Clean and Mozart/Oz are. But then Isaac is the final
arbiter here, as it is his project, and what he says goes.

If I recall things correctly, Isaac dropped the D code because it was
32-bit only, which was too much trouble for his setup. Now we have good
64 bit generation, so it may be a good time to redo D implementations of
the benchmarks and submit it again to Isaac for inclusion in the shootout.
Quite frankly, however, your remark (which I must agree, for all respect
I hold for you, is baseless) is a PR faux pas - and unfortunately not
the only one of our community. I'd find it difficult to go now and say,
"by the way, Isaac, we're that community that insulted you on a couple
of occasions. Now that we got to talk again, how about putting D back in
the shootout?"

I looked at the Java code and the Groovy code a couple of years back (I
haven't re-checked the Java code recently), and it was more or less a
transliteration of the C code.

That is contributed code. In order to demonstrate bias you'd need to
show that faster code was submitted and refused.

This meant that the programming
languages were not being shown off at their best. I started a project
with the Groovy community to provide reasonable version of Groovy codes
and was getting some take up. Groovy was always going to be with Python
and Ruby and nowhere near C, C++, and Fortran, or Java, but the results
being displayed at the time were orders of magnitude slower than Groovy
could be, as shown by the Java results. The most obvious problem was
that the original Groovy code was written so as to avoid any parallelism
at all.

Who wrote the code? Is the owner of the shootout site responsible for
those poor results?

Of course Groovy (like Python) would never be used directly for this
sort of computation, a mixed Groovy/Java or Python/C (or Python/C++,
Python/Fortran) would be -- the "tight loop" being coded in the static
language, the rest in the dynamic language. Isaac said though that
this was not permitted, that only pure single language versions were
allowed. Entirely reasonable in one sense, unfair in another: fair
because it is about language performance in the abstract, unfair because
it is comparing languages out of real world use context.

I'd find it a stretch to label that as unfair, for multiple reasons. The
shootout measures speed of programming languages, not speed of systems
languages wrapped in shells of other languages. The simpler reason is
that it's the decision of the site owner to choose the rules. I happen
to find them reasonable, but I get your point too (particularly if the
optimized routines are part of the language's standard library).

(It is worth noting that the Python is represented by CPython, and I
suspect PyPy would be a lot faster for these micro-benchmarks. But only
when PyPy is Python 3 compliant since Python 3 and not Python 2 is the
representative in the Shootout. A comparison here is between using
Erlang and Erlang HiPE.)
In the event, Isaac took Groovy out of the Shootout, so the Groovy
rewrite effort was disbanded. I know Isaac says run your own site, but
that rather misses the point, and leads directly to the sort of hassles
Walter had when providing a benchmark site.

That actually hits the point so hard, the point is blown into so little
pieces, you'd think it wasn't there in the first place. It's a website.
If it doesn't do what you want, at the worst case that would be "a
bummer". But it's not "unfair" as the whole notion of fairness is
inappropriate here. Asking for anything including fairness _does_ miss
the point.

There is no point in a
language development team running a benchmark. The issues are
perceived, if not real, bias in the numbers. Benchmarks have to be run
by an independent even if the contributions are from language
development teams.

But I'm sure Russel had something in mind. Russel, would you want to
expand a bit?

Hopefully the above does what you ask.
The summary is that Isaac is running this in good faith, but there are
systematic biases in the whole thing, which is entirely fine as long as
you appreciate that.

Well, to me your elaboration seems like one of those delicious
monologues Ricky Gervais gets into in the show "Extras". He makes some
remark, figures it's a faux pas, and then tries to mend it but instead
it all gets worse and worse.
Andrei

I rather object to the baseless accusation that the benchmarks game is
"designed to show that C is the one true language for writing performance
computation."
Your accusation is false.
Your accusation is ignorant (literally).

This is why I quit posting any benchmark results. Someone was always accusing
me
of bias, sabotage, etc.

This is why I quit posting any benchmark results. Someone was always accusing
me
of bias, sabotage, etc.

My feeling is that used to happen much more often 4 or 5 years ago, these days
a third-party has usually jumped-in to challenge ignorant comments about the
benchmarks game before I even notice.
Such ignorant comments are just seen to reflect badly on the person who made
them.

I rather object to the baseless accusation that the benchmarks game is =

omputation."
Overstated perhaps, baseless, no. But this is a complex issue.

Your accusation is false.
Your accusation is ignorant (literally).

The recent thread between Caligo, myself and others on this list should
surely have displayed the futility of arguing in this form.

It also strikes me as something rather random to say. Far as I can tell=

the shootout comes with plenty of warnings and qualifications and uses a=

variety of tests that don't seem chosen to favor C or generally systems=

programming languages.

The Shootout infrastructure and overall management is great. Isaac has
done a splendid job there. The data serves a purpose for people who
read between the lines and interpret the results with intelligence. The
opening page does indeed set out that you have to be very careful with
the data to avoid comparing apples and oranges. The data is presented
in good faith.
The system as set out is biased though, systematically so. This is not
a problem per se since all the micro-benchmarks are about
computationally intensive activity. Native code versions are therefore
always going to appear better. But then this is fine the Shootout is
about computationally intensive comparison. Actually I am surprised
that Java does so well in this comparison due to its start-up time
issues.
Part of the "problem" I alluded to was people using the numbers without
thinking. No amount of words on pages affect these people, they take
the numbers as is and make decisions based solely on them. C, C++ and
Fortran win on most of them and so are the only choice of language. (OK
so Haskell wins on the quad-core thread-ring, which I find very
interesting.)
As I understand it, Isaac ruins this basically single handed, relying of
folk providing versions of the code. This means there is a highly
restricted resource issue in managing the Shootout. Hence a definite
set of problems and a restricted set of languages to make management
feasible. This leads to interesting situation such as D is not part of
the set but Clean and Mozart/Oz are. But then Isaac is the final
arbiter here, as it is his project, and what he says goes.
I looked at the Java code and the Groovy code a couple of years back (I
haven't re-checked the Java code recently), and it was more or less a
transliteration of the C code. This meant that the programming
languages were not being shown off at their best. I started a project
with the Groovy community to provide reasonable version of Groovy codes
and was getting some take up. Groovy was always going to be with Python
and Ruby and nowhere near C, C++, and Fortran, or Java, but the results
being displayed at the time were orders of magnitude slower than Groovy
could be, as shown by the Java results. The most obvious problem was
that the original Groovy code was written so as to avoid any parallelism
at all.
Of course Groovy (like Python) would never be used directly for this
sort of computation, a mixed Groovy/Java or Python/C (or Python/C++,
Python/Fortran) would be -- the "tight loop" being coded in the static
language, the rest in the dynamic language. Isaac said though that
this was not permitted, that only pure single language versions were
allowed. Entirely reasonable in one sense, unfair in another: fair
because it is about language performance in the abstract, unfair because
it is comparing languages out of real world use context.
(It is worth noting that the Python is represented by CPython, and I
suspect PyPy would be a lot faster for these micro-benchmarks. But only
when PyPy is Python 3 compliant since Python 3 and not Python 2 is the
representative in the Shootout. A comparison here is between using
Erlang and Erlang HiPE.)
In the event, Isaac took Groovy out of the Shootout, so the Groovy
rewrite effort was disbanded. I know Isaac says run your own site, but
that rather misses the point, and leads directly to the sort of hassles
Walter had when providing a benchmark site. There is no point in a
language development team running a benchmark. The issues are
perceived, if not real, bias in the numbers. Benchmarks have to be run
by an independent even if the contributions are from language
development teams.

But I'm sure Russel had something in mind. Russel, would you want to=20
expand a bit?

Hopefully the above does what you ask.
The summary is that Isaac is running this in good faith, but there are
systematic biases in the whole thing, which is entirely fine as long as
you appreciate that.
--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road m: +44 7770 465 077 xmpp: russel russel.org.uk
London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

marking look at=0A>> >> http://shootout.alioth.debian.org/ it is fundament=
ally designed to =0A>> >> show that C is the one true language for writing=
performance =0A>> >> computation.=0A=0A> Overstated perhaps, baseless, no=
.=A0 But this is a complex issue.=0A=0AFalse and baseless, and a simple iss=
ue. =0A=0AYour words are clear - "... designed to show ...".=0A=0AYour fals=
e accusation is about purpose and intention - you should take back that acc=
usation.

does indeed set out that you have to be very careful with=0A> the data to a=
void comparing apples and oranges.=A0 =0A=0ANo, the opening page says - "A =
comparison between programs written in such different languages *is* a comp=
arison between apples and oranges..."=0A=0A=0A> Actually I am surprised tha=
t Java does so well in this comparison due =0A> to its start-up time issues=
.=0A=0APerhaps the start-up time issues are less than you suppose.=0A=0AThe=
Help page shows 4 different measurement approaches for the Java program, a=
nd for these tiny tiny programs, with these workloads, the "excluding start=
-up" "Warmed" times really aren't much different from the usual times that =
include all the start-up costs -=0A=0Ahttp://shootout.alioth.debian.org/hel=
p.php#java=0A=0A=0A> Part of the "problem" I alluded to was people using th=
e numbers =0A> without thinking. =0A=0ADo you include yourself among that g=
roup of people?=0A=0A=0A> I started a project with the Groovy community to =
provide reasonable version=0A> of Groovy codes and was getting some take up=
.=A0=0A=0AYou took on the task in the first week of March 2009=0A=0A=A0=A0 =
http://groovy.329449.n5.nabble.com/the-benchmarks-game-Groovy-programs-td36=
6268.html#a366290=0A=0Aand iirc 6 months later not a single program had bee=
n contributed !=0A=0A=A0=A0 http://groovy.329449.n5.nabble.com/Alioth-Shoot=
out-td368794.html=0A=0A=0A> In the event, Isaac took Groovy out of the Shoo=
tout, so the Groovy=0A> rewrite effort was disbanded.=A0=0A=0AYour "Groovy =
rewrite effort" didn't contribute a single program in 6 months !=0A=0A=0A> =
There is no point in a language development team running a benchmark. =0A=
=0A=0ATell that to the PyPy developers http://speed.pypy.org/=0A=0A=0ATell =
that to Mike Pall http://luajit.org/performance_x86.html=0A=0ATell that to =
the Go developers

Actually I am surprised that Java does so well in this comparison due=

to its start-up time issues.

Perhaps the start-up time issues are less than you suppose.

Very possibly the case, I have only switched to Java 7 recently and
haven't had time to assess start up or JIT kick in times. Great strides
in startup time have been made with each release of Java, at least on
Linux, using mmap and preloaded runtime infrastructure.

The Help page shows 4 different measurement approaches for the Java
program, and for these tiny tiny programs, with these workloads, the
"excluding start-up" "Warmed" times really aren't much different from
the usual times that include all the start-up costs -
=20
http://shootout.alioth.debian.org/help.php#java

My experience is similar, that JIT warmup is only a small effect when
using int and yet quite dramatic when using long. This is to be
expected though because a long is not an atomic type in the JVM but is
two ints.
[...]

Your "Groovy rewrite effort" didn't contribute a single program in 6 mont=

In the interim Groovy had been ejected so there was no point. No real
need, and probably highly inappropriate to rehearse all the arguments
here. It's water under the bridge. There is no no enthusiasm for
getting back into the shootout since Groovy is not a language for
computationally intensive code, that is the realm of D, C, C++, Fortran,
and sometimes Haskell. If we want to progress this point we should do
so on the Groovy lists, or privately, rather than here.

There is no point in a language development team running a benchmark.=

Do you have a particular URL in mind?
My point though was that (and this is where Walter received so much
flak) where a language vendor tries to do comparisons with other
languages there are always claims of bias whether true or not. In this
sense the Alioth Shootout has the benefit of being clearly independent
of any language vendor. The examples above are not at all inconsistent
with my point.
=20
--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road m: +44 7770 465 077 xmpp: russel russel.org.uk
London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

That is not true - Groovy had not been ejected.
Your 6 month failure to contribute a single program was a strong reason
measurements were not made for Groovy programs on the new hardware following
September 2009.

If we want to progress this point we should do
so on the Groovy lists, or privately, rather than here.

That is not true - Groovy had not been ejected.
=20
Your 6 month failure to contribute a single program was a strong reason m=

September 2009.
The lack of interest in the Groovy community for doing the work caused
me to loose energy. In the end the removal of Groovy from the Shootout
was a shame but not really that big a deal.
[...]