I am a hardware developer,and I want to map the ViT model of timm to some custom accelerators which only support FP16 precision. But I have learned that the model cannot be quantized to FP16 by torch.quantization.quantize_static
(actually, I am not aware of the difference between quantize_dynamic
and quantize_static
, just someone told me to use quatize_static.)
I think there must be some ways to get it. Is there any tutorials?
By the way, is it neccesary to retrain the model if I find the way to use quantize_static
? I not good at AI software.