I have trained a CNN model in a DDP manner using 4 GPU’s, batch size of 16 per GPU, lr = 1e-5 and wd = 5e-5 and got satisfactory results.
Now I want to recreate the same training process and accuracy using a single GPU. How should I scale my hyperparemeters? That is, how should I change by batch size, lr and wd.
Thanks
Read this paper: https://arxiv.org/pdf/1706.02677
Tried different batch sizes and lr, but alas didn’t recreate my results.
New contributor
Arkadi Piven is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.