how can i overcome to remove # signs when I am using ber-tokenizer
Tokens length: 200
Tokens: [‘[CLS]’, ‘educational’, ‘background’, ‘computer’, ‘applications’, ‘masters’, ‘degree’, ‘software’, ‘along’, ‘strong’, ‘skills’, ‘p’, ‘##yt’, ‘##hon’, ‘sq’, ‘##l’, ‘j’, ‘##ava’, ‘l’, ‘##in’, ‘##ux’, ‘g’, ‘##it’, ‘certification’, ‘cloud’, ‘computing’, ‘well’, ‘##e’, ‘##qui’, ‘##pped’, ‘successful’, ‘career’, ‘tech’, ‘industry’, ‘given’, ‘interest’, ‘cloud’, ‘computing’, ‘could’, ‘ex’, ‘##cel’, ‘roles’, ‘cloud’, ‘solutions’, ‘architect’, ‘cloud’, ‘engineer’, ‘de’, ‘##vo’, ‘##ps’, ‘engineer’, ‘leverage’, ‘skills’, ‘cloud’, ‘technologies’, ‘design’, ‘implement’, ‘manage’, ‘cloud’, ‘infrastructure’, ‘organizations’, ‘recommend’, ‘continuing’, ‘enhance’, ‘expertise’, ‘cloud’, ‘computing’, ‘pursuing’, ‘advanced’, ‘certification’, ‘##s’, ‘like’, ‘a’, ‘##ws’, ‘certified’, ‘solutions’, ‘architect’, ‘micro’, ‘##so’, ‘##ft’, ‘certified’, ‘a’, ‘##zure’, ‘solutions’, ‘architect’, ‘expert’, ‘stand’, ‘competitive’, ‘tech’, ‘market’, ‘additionally’, ‘gaining’, ‘experience’, ‘real’, ‘##world’, ‘cloud’, ‘projects’, ‘internship’, ‘##s’, ‘freelance’, ‘opportunities’, ‘boost’, ‘profile’, ‘keep’, ‘networkin
I am tring this code using bert-tokenize
seq_length=200
tokens= tokenizer(df['answer'].tolist(),
max_length=seq_length,
truncation=True,
padding='max_length',
add_special_tokens=True,
return_tensors='np')
tokens have input_ids where i have seen some tokens are divided to subtokens and it may effect the whole results,
I want to expect only one token instead of sub tokens