Am Montag, den 12.05.2008, 14:22 +0100 schrieb Richard Jones:
> This is just barely faster than Jon's OCaml version using message
> passing (12% faster on my test machine[0]). Which just seems to show
> that the overhead of message passing _isn't_ the problem here[1].
I've just written my own distributed version. You find my comments and
timings here:
http://blog.camlcity.org/blog/parallelmm.html
The code is here:
https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/examples/rpc/matrixmult/
In this (very unoptimized) multiplier message passing accounts for ~25%
of the runtime. Even for 2 cores there is already a speedup. 10 cores
(over a network) are about 4 times faster than a single core without
message passing.
Gerd
> Perhaps it's the bounds checking in the assignment back to the matrix?
>
> Anyhow, in real life situations we'd all be using a super-optimized
> hand-coded-in-assembly matrix multiplication library (LAPACK?), so
> this is all very academic.
>
> Rich.
>
> [0] Quad core Intel hardware:
> model name : Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz
>
> [1] Creation of the result matrix and copying it to shared memory is
> almost instantaneous in my tests.
>
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------