from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="cuda:3",
)
There are many GPUs on the server, but I can only use two of them. How should I configure device_map (or other parameters) so that the model runs on both GPUs?