I have written a python script that uses Llama and it is working well.
However I have some doubts about the workings of Llama. In my case in one terminal I am running ollama run llava
and also I can see that on the local host port
11434 Ollama is running as well. However when I stop running ollama run, the server in local host still runs.
I would like for someone to clarify:
- What is the difference between
ollama run <model>
andollama serve
. My hunch is that ollama run actually pulls a model and runs a client to the server, is that correct?
In my application I simply sent a post request to the local host api generate with the name of the desired model so I suspect ollama run is something similar
-
If I have the local host port 11434 running already, do I need to run
ollama serve
? I do not remember running it before but somehow the server seems to be running -
If I download new models can they be used like that without restarting anything?