I have been analyzing the maximum throughput I can get from my device for a specific CNN model using a GPU. My GPU has CUDA cores as well as Tensor cores. So I want to simultaneous run the model on both the type of cores simultaneously and check the maximum possible throughput I can get.
I did use with torch.cuda.amp.autocast()
to make sure that the model leverages automatic mixed precision and hence, it should also be able to use tensor cores. However, when I ran the tests with inference running for full precision
and amp
simultaneously, it provided a throughput which was more than that with full precision (when run separately), but was lesser than that with amp (when run separately). This means that Pytorch is for sure not able to use Tensor Cores, because if that was the case, I would have gotten the throughput which is almost equivalent to the sum of both the cases.
Is there a way I could toggle the use of Tensor Cores so that Pytorch uses them?