> I just test things and go for the fastest. But if we do theoretic math, SHMEM
> is difficult to beat of course.
> Google for measurements with shmem, not many out there.
SHMEM within the node or between nodes?
> Fact that so few standardized/rewrote their floating point software to gpu's,
> is already saying enough about all the legacy codes in HPC world :)
>> When some years ago i had a working 2 cluster node here with QM500- A , it
> had at 32 bits , 33Mhz pci long sleeve slots a blocked read latency of under 3
> us is what i saw on my screen. Sure i had no switch in between it. Direct
> connection between the 2 elan4's.
>> I'm not sure what pci-x adds to it when clocked at 133Mhz, but it won't be a
> big diff with pci-e.
There is a big different between PCIX and PCIe. PCIe is half the latency - from 0.7 to 0.3 more or less.
> PCI-e probably only has a bigger bandwidth isn't it?
Also bandwidth ...:-)
> Beating such hardware 2nd hand is difficult. $30 on ebay and i can install 4
> rails or so.
> Didn't find the cables yet though...
>> So i don't see how to outdo that with old infiniband cards which are
> $130 and upwards for the connectx, say $150 soon, which would allow only
> single rail
> or maybe at best 2 rails. So far didn't hear anyone yet who has more than
> single rail IB.
>> Is it possible to install 2 rails with IB?
Yes, you can do dual rails
> So if i use your number in pessimistic manner, which means that there is
> some overhead of pci-x, then the connectx type IB, can do 1 million blocked
> reads per second theoretic with 2 rails. Which is $300 or so, cables not
> counted.
Are you referring to RDMA reads?