Llama3 8B generating text very slow on Tesla Core GPU
Please help me with this one I am trying to improvise the text for my CSV file using Llama3 on Tesla core GPU. I have successfully loaded the model to GPU, but after running the nvidi-smi command, I do not see any spike in GPU usage. I am running the model on the local computer with the below code Please check and let me know what I can do to improve the speed of text generation. Everything looks correct to me in the code but I am still unable to generate the text I am unable to understand why so.