I’m following the code, exactly, as per the tutorial mentioned here. I’ve seen others on Reddit running 12GB machines mention that they follow this tutorial and they’re able to run the code in the blog as it is, except that they change the batch size to 3 to avoid an OOM error.
However, I’m running my code on a 24GB 3090Ti but I keep running into an OOM error, even when I change the batch size to 3. Here’s a screenshot of my NVIDIA-smi command after running trainer.train(). It runs into an OOM error within 2 seconds of execution.
Can anyone please explain how to fix this? I don’t get why my machine with 24GB runs into an OOM if the exact same code works on a 12GB machine.