I find SentenceTransformers seems to be the preferred open source option for sentence embedding. But I can’t figure out what sbert provide exactly to simplify sentence embedding compared to using HuggingFace Transformers(tokenizer & model) directly ?
For all the pre-trained Sentence Transformers models listed on sbert, their HuggingFace pages all mention these words, I use all-MiniLM-L6-v2 as an example,
Without sentence-transformers, you can use the model like this: First,
you pass your input through the transformer model, then you have to
apply the right pooling-operation on-top of the contextualized word
embeddings.
What does “apply the right pooling-operation on-top of the contextualized word embeddings” mean ? What does mean_pooling
do ?
#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
My second question is what should I do if I want to use a pretrained model that does not list as a pre-trained Sentence Transformers models on sbert? And is this a good idea ?
Take google-bert/bert-base-multilingual-cased
as an example, its huggingface page does not mentions sentence-transformers, so if I want to use it for sentence embedding, can I just “apply the right pooling-operation on-top of the contextualized word embeddings.” as https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 suggest ?