How Can I Optimize Machine Translation Model Training to Overcome GPU Memory Overflow Issues?
I’m trying to train a fairly standard machine translation transformer model using PyTorch. It’s based on the “Attention is All You Need” paper. When I ran it on my PC with standard hyperparameters and a batch size of 128 segments (pairs of source and target language sentences), it worked fine but was slow, as expected.