I am trying to run the Merlyn-education-corpus-qa model on my two RTX 4090 graphics cards. The code I am using is exactly the same as the one on the HuggingFace page for the model (https://huggingface.co/MerlynMind/merlyn-education-corpus-qa). The only change I’ve made to the code is adding the .half() to the model and the inputs:
model.half()
...
inputs = tokenizer(prompt, return_tensors="pt").half().to(device)
The problem is that the process ends with ‘Killed’.
Is there any way to use not just one GPU but both of the ones I have? Perhaps using Accelerate or DeepSpeed libraries.
Edit:
The code I have used is as follows, where I make use of a single GPU:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = "MerlynMind/merlyn-education-corpus-qa"
device = torch.device("cuda:0") # change device id as necessary
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, fast_tokenizer=True)
model.half()
model.to(device) # move to device
2