On Dec 20, 2009, at 8:08 PM, Gerd Stolpmann wrote:
>> The following web page describes a commercial machine sold by Azul
>> Systems
>> that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in
>> a flat
>> SMP configuration:
>>
>> http://www.azulsystems.com/products/compute_appliance.htm
>>
>> As you can see, a GC with shared memory can already scale across
>> dozens of
>> cores and memory access is no more heterogeneous than it was 20
>> years ago.
>> Also, note that homogeneous memory access is a red herring in this
>> context
>> because it does not undermine the utility of a shared heap on a
>> multicore.
>
> The benchmarks they mention can all easily be parallelized - that
> stuff
> you can also do with multi-processing. The interesting thing would
> be an
> inherent parallel algorithm where the same memory region is accessed
> by
> multiple threads. Or at least a numeric program (your examples seem to
> be mostly from that area).
I'm not sure if it is relevant here, but it should be noted that a lot
of the performance gains Azul gets is because they have built their
own chips that do a lot of tricks for you under the hood. Last I used
an Azul Appliance, they perform quite poorly if you are hitting the
same memory often from multiple threads (the machine I used was about
4x slower than an equivalent Intel machine for a single core). If the
Azul tricks make it into desktop processors, that would likely be
pretty great.
Also, for what it's worth, lots of cores have actually been less
performant in the type of computing I currently do. We want less
cores and more physical boxes, making multiple processes running
single threads a better solution for us. We tend to become memory IO
bound by multiple cores (the bus cannot keep up with us). We are
processing lots of biological data. For the record we are not using
Ocaml for our project, just an observation of what model works well
for us.