Ollama, how can I use all the GPUs I have?
I am running Ollma on a 4xA100 GPU server, but it looks like only 1 GPU is used for the LLaMa3:7b
model.
How can I use all 4 GPUs simultaneously?
I am not using a docker
, just use ollama serve
and ollama run
.