error: ‘cudaDriverEntryPointQueryResult’ was not declared in this scope
On my CUDA compiler identification is NVIDIA 11.7.64 & I am importing both cuda_runtime_api.h as well as cuda_runtime.h but this error still persists.Any workarounds?
error: ‘cudaDriverEntryPointQueryResult’ was not declared in this scope
On my CUDA compiler identification is NVIDIA 11.7.64 & I am importing both cuda_runtime_api.h as well as cuda_runtime.h but this error still persists.Any workarounds?
How to properly free a Cuda context?
I am implementing Optix denoising inside my C++ path tracer. I then need to create a Cuda context before calling Optix kernels. That context should be created every time i spawn a rendering thread since each thread have its own Cuda context
identifier “atomicAdd” in cuda
I was running the k-means algorithm using cuda and encountered a problem in this part of the code before for if (idx < numPoints) { atomicAdd(&counts[points[idx].cluster], 1);
code:
identifier “atomicAdd” in cuda
I was running the k-means algorithm using cuda and encountered a problem in this part of the code before for if (idx < numPoints) { atomicAdd(&counts[points[idx].cluster], 1);
code:
Perform quick flip operations on matrices using CUDA
I want to perform A fast flip operation similar to Matlab for 3D matrix in CUDA C++, but I have encountered a speed bottleneck and need to ask for help. The following will take 222 matrix A to demonstrate the flip function as an example (A = reshape(1:8,2,2,2):
shared memory value instable
This is my first time using shared memory.
CUDA copy class object containg pointer to another class
I am trying to copy a class object containing pointers to another. In particular, I have a class LikelihoodConstructor which contains an array of pointers to another class, DataModel which contains an array ‘bins’ which im trying to access. essentially in the kernel I would like to run is the following :
Summation of a polynomial in CUDA
I would like to perform a summation operation on a polynomial inside a cuda kernel which contains coefficients and function as given
What are the risks of increasing cudaLimitDevRuntimePendingLaunchCount?
I encountered an error while using dynamic parallelism: