How to run LLM from transformers library under Windows without GPU?
I have no GPU but I can run openbuddy-llama3-8b-v21.1-8k
from ollama
. It works with speed of ~1 t/s.
How to adapt llama v2 model to less than 7b parameters
Help me please in python