I am trying to map MPI processes to sockets in a somewhat compacted pattern and I am wondering the best way to do it.

Say there are 2 sockets (0 and 1) and each processor has 4 cores (0,1,2,3) and I have 4 MPI processes, each of which will use 2 OpenMP processes.

I've re-ordered my parallel work such that pairs of ranks (0,1 and 2,3) communicate more with each other than with other ranks. Thus I think the best mapping would be:

RANK SOCKET CORE
0 0 0
1 0 2
2 1 0
3 1 2

My understanding is that --bysocket --bind-to-socket will give me ranks 0 and 2 on socket 0 and ranks 1 and 3 on socket 1, not what I want.

It looks like --cpus-per-proc might be what I want, i.e. seems like I might give the value 2. But it was unclear to me whether I would also need to give --bysocket and the FAQ suggests this combination is untested.

May be a rankfile is what I need?

I would appreciate some advice on the easiest way to get this mapping.