Slow Ollama API – how to make sure the GPU is used
I made a simple demo for a chatbox interface in Godot, using which you can chat with a language model, which runs using Ollama. Currently, the interface between Godot and the language model is based on the Ollama API. The response time is about 30 seconds.
Slow replies from Ollama when called from Godot
I am working on a toy project in Godot. What I want to do (based on this) is to make a simple chat interface, where the user writes some text and gets replies from a large language model running locally. I am using Ollama to run the model.