I am looking for sample code for CUDA testing TMA bandwidth, where can I find it, or does Nvidia not provide this code?
I looked for CUDA 12.4 sample program, but could not find the bandwidth test code with TMA.
Does anyone know where I can find relevant TMA bandwidth test benchmarks
New contributor
Shui_ is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.