Le 06/02/2014 21:31, Brock Palen a écrit :
> Actually that did turn out to help. The nvml# devices appear to be numbered in the way that CUDA_VISABLE_DEVICES sees them, while the cuda# devices are in the order that PBS and nvidia-smi see them.

By the way, did you have CUDA_VISIBLE_DEVICES set during the lstopo
below? Was it set to 2,3,0,1 ? That would explain the reordering.

I am not sure in which order you want to do things in the end. One way
that could help is:
* Get the locality of each GPU by doing CUDA_VISIBLE_DEVICES=x (for x in
0..number of gpus-1). Each iteration gives a single GPU in hwloc, and
you can retrieve the corresponding locality from the cuda0 object.
* Once you know which GPUs you want based on the locality info, take the
corresponding #x and put them in CUDA_VISIBLE_DEVICES=x,y before you run
your program. hwloc will create cuda0 for x and cuda1 for y.

If you don't set CUDA_VISIBLE_DEVICES, cuda* objects are basically
out-of-order. nvml objects are (a bit less likely) ordered by PCI bus is
(lstopo -v would confirm that).