when is add_prefix_space option required and why?
What is the purpose of the add_prefix_space
and how to know which models would require it?
when is add_prefix_space option required and why?
What is the purpose of the add_prefix_space
and how to know which models would require it?
when is add_prefix_space option required and why?
What is the purpose of the add_prefix_space
and how to know which models would require it?
when is add_prefix_space option required and why?
What is the purpose of the add_prefix_space
and how to know which models would require it?
How to get custom trained Bert tokenizer not to split certain characters
I am training my own tokenizer based on bert-based-cased
. The problem I have is that in my data (dead language), there are tokens that begin with =
and this should not be split off from the rest of the token. How do I achieve that?
Thanks for your help!