Relative Content

Tag Archive for parallel-processingcudamulti-gpu

Cannot Successfully Implement Parallel Reduction for muti-CUDA GPU

I try to run the following code which would compute the dot product of two vectors, and the code can run well when the input number of GPU is 1, that is, the Omp package isn’t really used, but when the number of GPU is 2, the GPU result is always 0, I don’t know where is wrong, I just use usual parallel reduction in gpu code, and the seperate the work in N GPUs. I’ve check the code of multiGPUs run well when I don’t use parallel reduction in gpu code, that is, I let C[i] = A[i]+B[i] and compute the sum at host.