Grounding Dino is a very powerful zero-shot learning, but inference time is too slow.
With my GPU Nvdia GTX 950 it takes about 3 seconds for image (1100×840).
There are some parameter I can use to speed-up the inference time (and not loss too much in quality)?
In particular, I have in loop the same text, so how can I save the text encode time, doing it just for the first time?
Thanks,
Joe