I am trying to do the following:
Let’s say I want to process 3 layers on the same GPU. Because of the limited memory, I can load the weights only for 1 layer at a time. I have stored the compressed weights for all the three layers on the GPU.
When I want to process a layer, I decompress its weights, run the forward pass and store the activations for the next layer. Similarly, I will run the backward pass.
I am not sure if this can be done through model class (where I define forward/back prop.) or I need to make changes to Pytorch source code itself.
It will be great if someone has any pointers. Thanks!