Trainer student model training strategy
I am planning on using a LLM (say llama3) to extract training data via a prompt, and then using a smaller model with a CLS token to do a custom training to try and match the accuracy of the LLM. Suppose that I can run the prompt on 1M+ data (although I suspect I won’t need as many).