What is DAPL UD?

Traditional InfiniBand* support involves MPI message transfer over the Reliable Connection (RC) protocol. While RC is long-standing and rich in functionality, it does have certain drawbacks: since it requires that each pair of processes setup a one-to-one connection at the start of the execution, memory consumption could (at the worst case) grow linearly as more MPI ranks are added and the number of pair connections grows.

In recent years, the User Datagram (UD) protocol has emerged as a more memory-efficient alternative to the standard RC transfer. UD implements a connectionless model that allows for a many-to-one connection to be set up, using a fixed number of connection pairs even as more MPI ranks are started.

Availability

There are two aspects to DAPL UD support: availability in the InfiniBand* software stack, and support in the MPI implementation.

The Open Fabrics Enterprise Distribution (OFED™) stack is open source software for high-performance networking applications offering low latencies and high bandwidth. It is developed, distributed, and tested by the Open Fabrics Alliance (OFA) – a committee of industry, academic, and government organizations working to improve and influence RDMA fabric technologies. Support for the DAPL UD extensions is part of OFED 1.4.2 and later. Make sure you have the latest OFED installed on your cluster.

On the MPI side, the Intel® MPI Library has supported execution over DAPL UD since Intel MPI 4.0 and later. Make sure you have the latest Intel MPI version installed on your cluster. To download the latest release, log into the Intel® Registration Center or check our website.

Enabling DAPL UD

To enable usage of DAPL UD with your Intel MPI application, you need to set the following environment variables:

$ export I_MPI_FABRICS=shm:dapl$ export I_MPI_DAPL_UD=enable

Note that the shm:dapl setting is default for the I_MPI_FABRICS environment variable. This will use the shm device for intra-node communication and the dapl device when communicating between nodes.

Selecting the DAPL UD provider

Finally, select the appropriate DAPL provider that supports the UD InfiniBand* extensions. While several providers (e.g. scm, ucm) offer this functionality, we recommend using the ucm device as that offers better scalability and is more suitable for many-core machines. For example, given the following /etc/dat.conf entries:

Comments (1)

I am trying to test the Intel MPI Benchmark(IMB) 4.0 beta on our Windows PCs cluster. Both Intel MPI 5.0 and the WinOFED 3.2 are installed on our Windows PCs cluster. When I did tests, the following errors were always occurred:

C:\Users\dingjun\mpi5tests>mpiexec -configfile config_file

dapls_ib_init() NdStartup failed with NTStatus: The specified module could not be found.

Could you tell me the reasons why above errors occurred? If you are not able to answer this question, could you tell me someone in the Intel Corp. who can answer it?

By the way, on our LINUX PCs cluster the Intel MPI DAPL option works very well and the above problem only occurred on our Windows PCs cluster. In addition, What kind of hardware is used to pass DAPL over Infiniband test ? We need hardware information such as vender and model and driver information, provided by vender or opensource, if opensource what’s download link.

I am looking forward to hearing from you and your early response is highly appreciated.