I am doing an internship in a llm company, want to try to go throught the pretraining of a mini llm.
Now I can use 8 * A800(80G) to do this, and I hope the pretraining time can be controlled to 1~2 weeks, just going through the pretraining process is enough for me.
I hope I can study the newest techniques in the go-through, such as huggingface accelerate, deepspeed, etc.
Anybody can give me some advice on the reference project?And the advice on the model parameters and size of datasets is really needed. Thank you very much! Also anyone who have the same thoughts is also welcome! I want to find friends to communicate.
I try to find serveral projects, but some of them don’t use the newest techniques such as huggingface accelerate, I don’t know which is appropriate to take for reference.
IcyFeather is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.