Hi all,
We currently have an IB DDR cluster of 128 AMD QuadCore 'Barcelona' (4 *
2 cores) using DDR 'Flextronics' switches ( http://www.ibswitches.com)
with an over-subscription 2:1 (two 24 port switches/rack a one 144 port
modular 'Flextronics' Fat-Tree switch to link). Every IB card in nodes,
is Mellanox DDR InfiniHost III Lx PCI Express x8 (one port) and we use
IB both for calculations (MPI) and storage (Lustre: 80 TB using 1 MDS
and 2 OSSs on a DDN 9900 unit)
Now, we are planning to add new AMD 'Magny-Cours' nodes (16 or 24 cores)
using Infiniband QDR, but linking with the Flextronics 144 port DDR
switch, using 'Hybrid Pasive Copper QSFP to MicroGiGaCN cables', so as
we can reach the Lustre storage.
But there are two main issues that we are worried about:
1 - One port IB cards QP saturation
Using 16 cores per node (8 * 2) seem the 'safe' option, but the 24 cores
(12 * 2) is better in term of price per job. Our CFD applications using
MPI (OpenMPI) may need to do about 15 'MPI_allreduce' calls in one
seccond or less, and we may probably using a pool of 1500 cores. Â¿Is
anyone having this kind of 'message rate', using AMD 24 cores, and can
tell me about his/her solution/experience?
2 - I've heard that QLogic behavior is better in terms of QP creation, I
have also to think on linking IB DDR with QDR to reach the 'Lustre'
storage. I suppose the main issue is, which QDR switch or switches are
linking with the Flextronics 144 port DDR switch, but I do not know what
is the role of the node card (Mellanox/Qlogic). Again, can anyone tell
me about his/her solution/experience?
Any comment suggestion will be welcomed
Thanks in advance
Regards
--
Ramiro Alba
Centre TecnolÃ²gic de TranferÃ¨ncia de Calor
http://www.cttc.upc.edu
Escola TÃ¨cnica Superior d'Enginyeries
Industrial i AeronÃ utica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 86 46
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.