Inline struct initialization, “nonstatic member must be relative to a static object”
I have a small issue with referencing the outer struct members from a nested struct. When I try to set x
and y
to width
and height
it shows the error “a nonstatic member must be relative to a static object”, here is the code:
Build issue with MatX concerning initialisation of shared variables
I’m attempting to build and install MatX onto my Linux machine.
CUDA equivalent of a C++ code that update parents array if there is a match
I am trying to writ the CUDA C++ equivalent code for the following C++ code. I have tried several ways but I can’t find a way to implement this code in parallel.
CUDA C++ Simulation: MD Particle Positions Becoming NaN During MD/MPCD Simulation
I’m working on a CUDA C++ simulation code that performs Molecular Dynamics (MD) and Multi-Particle Collision Dynamics (MPCD) for a system consisting of a fluid (MPCD particles) and a polymer (MD particles).
CUDA Initialising constants from a Array?
Hi there I have a CUDA program that has a global that has an array as input which contains my constants for running several kernels. If I set them inside the global without the array it runs at 6400ms if I set them from the array it slows right down to about 420000ms. Any ideas? E.g
Simple CUDA program flagged by Windows as Trojan
I’m new to CUDA and just doing some exercises to get myself started. The following program is adapted from a homework problem for matrix multiplication found here. I’m working on a Windows 10/x64 machine using
CUDA – access violation on each function
I have a C# app that runs CUDA code; I wanted to use cudaMalloc but apparently I have access violation at any CUDA API functions (cudaSetDevice, cudaMemGetInfo, cudaMalloc…). I am pretty sure the program ran once this morning (I had cudaMemGetInfo and return after that).
In CUDA, is it faster to generate matrices on the CPU or through a kernel in the GPU?
I’m a beginner on CUDA and C++ and I’m trying to multiply two 1000 x 1000 matrices that both contain randomly generated values from 0 to 100. For this I’m using Visual Studio. I’m also trying to compare how fast the entire program executes (calculation time + other operation times etc.) on GPU and on CPU. I made the multiplication that runs on CPU with C++. Sorry if my code looks clumsy, I’m still learning.
how to copy 3D array on device CUDA, where is third dimension is not constant
I am implementing a parallel two-dimensional SPH method. I have a problem with efficiently copying a three-dimensional array to a device.
I partially solved the problem by copying this entire array to the device, but this solution was achieved by converting a three-dimensional array into a one-dimensional one, while I had to make the 3rd dimension constant, otherwise I just didn’t figure out how to do it. After all, the number of particles in a grid cell can be different. And now I have a lot of inefficient memory allocated on the device.
CUBLAS root kernel is called twice for a reduction operation
I am making a call to the cublasSasum
function only once from cuBLAS. I do see that the actual kernel it calls (asum_kernel
) is called twice as seen from profiling via nsys
. I am computing a sum for a total of 4096^2 elements.