I am encountering an error while trying to instantiate the EmbeddingModel using the ONNX model intfloat/multilingual-e5-large. The error message is as follows:
Failed to instantiate [org.springframework.ai.embedding.EmbeddingModel]: Factory method ’embeddingClient’ threw exception with message: data did not match any variant of untagged enum PreTokenizerWrapper at line 69 column 3
I have exported the intfloat/multilingual-e5-large model to ONNX format and configured Spring AI to use the exported ONNX model and tokenizer. Here is the relevant portion of my tokenizer.json:
"pre_tokenizer": {
"type": "Sequence",
"pretokenizers": [
{
"type": "WhitespaceSplit"
},
{
"type": "Metaspace",
"replacement": "▁",
"prepend_scheme": "always",
"split": true
}
]
}
I suspect the issue might be related to the pre_tokenizer configuration not matching the expected format, but I am not sure how to resolve it.
Expected behavior
Could someone provide guidance on resolving this issue or point out what might be wrong with the configuration? Any help would be greatly appreciated.
Zakaria Hd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.