I’m ask to compute the dot product in a C program. The execise is to compare the sequential version (CPU) with the pararel version (GPU) using OpenACC directives. The issue is that the GPU version is slower than the CPU and that should be like that. What am I doing wrong?
*I’m running the code in a cluster
Sequential:
double dot_product_cpu(int n, double* x, double* y)
{
double result = 0.0;
for (int i = 0; i < n; i++)
{
result += x[i] * y[i];
}
return result;
}
Paralel (GPU):
double dot_product_gpu(int n, double* x, double* y)
{
double result = 0.0;
#pragma acc data copyin(x[0:n], y[0:n]) copy(result)
{
#pragma acc parallel loop reduction(+:result)
for (int i = 0; i < n; i++)
{
result += x[i] * y[i];
}
}
return result;
}
I tried executing with n=10^k where k=1,2,3,4,5,6,7 and none of them worked properly.
MARTÍ ARMENGOL AYALA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.