Relative Content

Tag Archive for parallel-processinggpunvidialarge-language-modelollama

Ollama, how can I use all the GPUs I have?

I am running Ollma on a 4xA100 GPU server, but it looks like only 1 GPU is used for the LLaMa3:7b model.
How can I use all 4 GPUs simultaneously?
I am not using a docker, just use ollama serve and ollama run.