I am trying to perform distributed training using TensorFlow’s MultiWorkerMirroredStrategy on a server equipped with four GPUs with the following specifications:
GPU 1: 24GB vRAM(Titan RTX)
GPU 2: 11GB vRAM(2080 Ti)
GPU 3: 11GB vRAM(2080 Ti)
GPU 4: 11GB vRAM(2080 Ti)
I have a model that requires 20GB of vRAM for training(maybe more than that). My understanding is that each GPU involved in the training process will need to load the model. Given the vRAM limitations on GPUs 2, 3, and 4, I am concerned that the model will not fit into these GPUs’ memory.
I attempted to distribute the training across all four GPUs using MultiWorkerMirroredStrategy, expecting that the strategy might handle the varying memory capacities or that TensorFlow might have a mechanism to load the model differently across GPUs with different vRAM sizes.
However, when I started the training using the All-Reduce strategy in Kubernetes, it terminated during the initialization phase. Additionally, even the Titan RTX with 24GB vRAM ran into an OOM (Out of Memory) error after using up 10GB of vRAM.
Is there a way to train this model across these GPUs despite their differing vRAM capacities? Or is it simply not feasible given the 20GB requirement?
hch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1