I’m developing a multi-GPU PyTorch application. Existing methods like scatter/gather in torch.distributed don’t fulfill my requirements, therefore I need to develop forward/backprop steps which send and receive gradients across the GPUs, while using built-in methods scatter/gather. I can do it myself. My final application will be executed on the multi-GPU server.
For development, budget constraints limit me to a single-node single-GPU server because our organization shares large cluster servers. One problem I encountered is the simulation of a multi-GPU setting in this single-GPU system.
How do I simulate a multi-GPU setting in a single-GPU system for testing those modules?