I analyzed the problem in depth. I get faster responses when I use the terminal, something is wrong with Ollama Python.
It only uses efficiency cores at 1.6Ghz :/, with terminal 5GHZ all cores.
i have:
i9 – 13980HX
24 Core – 32 Thread
I get faster responses when I use the terminal, something is wrong with Python.
Task Manager
How can ı fix that ?
import ollama
ollama_response = ollama.chat(
model='llama3',
stream=True,
messages=[
{
'role': 'system',
'content': '''You are an assistant with solid philosophical knowledge.''',
},
{
'role': 'user',
'content': 'what is the meaning of life?',
},
],
options={
'num_gpu': 0,
'num_thread': 32,
}
)
for chunk in ollama_response:
print(chunk['message']['content'], end='', flush=True)