Running nccl test with 2 nodes with one A10G on each node with GDR disabled.
Why do I see the following line in the logs “DMA-BUF is available on GPU device 0”. Will DMA_BUF be used when GDR is disabled ?
Appreciate the help !
[0] NCCL INFO NET/OFI Could not disable CUDA API usage for HMEM, disabling GDR [0] NCCL INFO NET/OFI Setting NCCL_PROTO to "simple" [0] NCCL INFO NET/OFI Could not disable CUDA API usage for HMEM, disabling GDR [0] NCCL INFO NET/OFI Setting NCCL_PROTO to "simple" [0] NCCL INFO DMA-BUF is available on GPU device 0 [0] NCCL INFO DMA-BUF is available on GPU device 0 [0] NCCL INFO comm 0x2515e00 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 1e0 commId 0x1c71981584deedae - Init START [0] NCCL INFO comm 0x2b049f0 rank 1 nranks 2 cudaDev 0 nvmlDev 0 busId 1e0 commId 0x1c71981584deedae - Init START [0] NCCL INFO NET/OFI Libfabric provider associates MRs with domains [0] NCCL INFO NET/OFI Libfabric provider associates MRs with domains [0] NCCL INFO Channel 00/02 : 0 1 [0] NCCL INFO Channel 01/02 : 0 1
pai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.