What Sam is alluding to is that the OpenFabrics driver code in OMPI is sucking up oodles of memory for each IB connection that you're using. The receive_queues param that he sent tells OMPI to use all shared receive queues (instead of defaulting to one per-peer receive queue and the rest shared receive queues -- the per-peer RQ sucks up all the memory when you multiple it by N peers).