I have a local server with multiple GPUs and I am trying to load a local model and specify which GPU to use since we want to split GPU between team members.
I can successfully specify 1 GPU using device_map=’cuda:3′ for smaller model, how to do this on multiple GPU like CUDA:[4,5,6] for larger model?
(I tried using device_map = ‘auto’, ‘balanced’, ‘sequential’, which will spread model automatically. But this is not what we want…)
import torch
from transformers import LlamaForCausalLM
model_dir = '/models/Llama-2-13b-chat-hf'
# 'auto' 'balanced' 'sequential' 'balanced_low_0'
# 'cuda:3',
model = LlamaForCausalLM.from_pretrained(model_dir,
device_map='cuda:[3,4,5]',#how to make things work here?
torch_dtype=torch.float32
)