I am using HuggingFace’s Llama-3-8B parameter model, so it is running locally and i asked it the question
pipeline("What is the capital of France? Answer in one word.")
And I left it to run for 30 mins, no response.
Even on a Ryzen 7 5800, it does not work. I understand GPU is better, but why does it take this long. I tried using Google-Gemma-2B, and this gives out a HORRIBLE answer, but at least in 1 minute run time.
Any idea why llama takes so long and 50% CPU?