I’m fine-tuning a small GPT model but I can’t get it to use my GPU
I’m currently fine-tuning a GPT-2 distilled model with approximately 130M parameters on 8 years’ worth of my WhatsApp chats. I’ve prepared a labeled dataset consisting of around 1 million sequences, each with 512 tokens. Below is the portion of code I am using for training:
I’m fine-tuning a small GPT model but I can’t get it to use my GPU
I’m currently fine-tuning a GPT-2 distilled model with approximately 130M parameters on 8 years’ worth of my WhatsApp chats. I’ve prepared a labeled dataset consisting of around 1 million sequences, each with 512 tokens. Below is the portion of code I am using for training: