I have used encoder_model.onnx for Flan – T5 small from here https://huggingface.co/datasets/bakks/flan-t5-onnx/tree/main
but while inferencing it throws the below error :
Error during ONNX inference: argument ‘ids’: ‘list’ object cannot be interpreted as an integer
i have used below code :
# Input prompt for disease prediction
input_prompt = "Predict the disease name in this text: "
input_symptoms = "fever, cough, shortness of breath, fatigue"
input_text = input_prompt + input_symptoms
# Tokenize input
inputs = tokenizer(input_text, return_tensors="pt").to(device)
# Load ONNX model
try:
onnx_model_path = "encoder_model.onnx"
onnx_session = onnxruntime.InferenceSession(onnx_model_path, providers=["CUDAExecutionProvider"])
except Exception as e:
print(f"Error loading ONNX model: {e}")
exit(1)
# Inference with ONNX model
try:
start_time = time.time()
onnx_outputs = onnx_session.run(None, {"input_ids": inputs["input_ids"].cpu().numpy(),
"attention_mask" : inputs["attention_mask"].cpu().numpy()})
end_time = time.time()
onnx_inference_time = end_time - start_time
print(f"ONNX Inference Time: {onnx_inference_time:.4f} seconds")
print(f"Predicted Disease: {tokenizer.decode(onnx_outputs[0], skip_special_tokens=True)}")
except Exception as e:
print(f"Error during ONNX inference: {e}")
my aim to compare the inference time of Flan T5 small model vs onnx file
plz guide