On 4/22/07, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
> > I wonder whether others have already noticed that allocations may
> > surprisingly be slower on 64bit platforms than on 32bit ones.
>
> As already mentioned, on 64-bit platforms almost all Caml data
> representations are twice as large as on 32-bit platforms (exceptions:
> strings, float arrays), so the processor has twice as much data to
> move through its memory subsystem.
Interesting, I was obviously under the wrong assumption that a 64bit
machine would scale appropriately when accessing 64bit words in
memory. Of course, I'm aware that cache effects also play a role, but
the minor heap should easily fit into the cache of any modern machine
in any case, and it's not like this experiment is eating memory.
> However, you certainly don't get a slowdown by a factor of 2, for two
> reasons: 1- the processor doesn't spend all its time doing memory
> accesses, there are some computations here and there; 2- cache lines
> are much bigger than 32 bits, meaning that accessing 64 bits at a
> given address is much cheaper than accessing two 32-bit
> quantities at two random addresses (spatial locality).
>
> Moreover, x86 in 64-bit mode is much more compiler-friendly than in
> 32-bit mode: twice as many registers, a sensible floating-point model
> at last. So, OCaml in 64-bit mode generates better code than in
> 32-bit mode.
>
> All in all, your 10% slowdown seems reasonable and in line with what
> others reported using C benchmarks.
This seems reasonable. It just seemed surprising to me that in some
of my tests a 64bit machine could be slower handling even "large"
Int64-values than in 32bit-mode, in which it always has to perform two
memory accesses and possibly some additional computation steps.
> Be careful with timings: I've seen simple changes in code placement
> (e.g. introducing or removing dead code) cause performance differences
> in excess of 20%. It's an unfortunate fact of today's processors that
> their performance is very hard to predict.
This surely also requires some caution when interpreting mini-benchmarks.
> ocamlopt compiles module initialization code in the so-called
> "compact" model, where code size is reduced by not open-coding some
> operations such as heap allocation, but instead going through
> auxiliary functions like "caml_alloc2". This makes sense since
> initialization code is usually large but not performance-critical.
> I recommend you put performance-critical code in functions, not in the
> initialization code.
Thanks, this is a very important bit of information that I wasn't
aware of! I used to run mini-benchmarks from initialization code in
most cases, which is obviously a bad idea...
Regards,
Markus
--
Markus Mottl http://www.ocaml.info markus.mottl@gmail.com