I am not sure if this even the right question to ask, but when I compile the Cuda Visual Studio example project, there is an --ptxas=v
option to turn on diagnostics which prints out the local memory memory being used and such. And I think the option is responsible for instructing the nvcc compiler to output the info.
Right now I am gripping with a stack overflow error in a large Cuda kernel, and I’d like to see exactly how much stack memory is being allocated.
options = []
options.append('--define-macro=NDEBUG')
options.append('--diag-suppress=550,20012')
options.append('--dopt=on')
options.append('--restrict')
options.append('--ptxas-options=-v')
raw_module = cp.RawModule(code=kernel, backend='nvcc', enable_cooperative_groups=True, options=tuple(options))
This is how I am compiling the kernels with Cupy, but no extra info gets printed even with the ptxas
option being set to verbose.