Hello, everyone
I get an efficiency problem in Reranker model.
The detail is Here: https://github.com/FlagOpen/FlagEmbedding/issues/988
I statistics the time of self.tokenizer
,I run the inference with batch=4
and with 250 batch
in GPU(A800),I get time in tokenizer is 29.99s
, But when I annotation code scores = self.model(**inputs, return_dict=True).logits.view(-1, ).float()
and I get tokenizer time is 4.16s
。Why does this happen ?
also here:
Does anybody can help with that ?