I would like to perform a summation operation on a polynomial inside a cuda kernel which contains coefficients and function as given
y = a1f1(x) + a2f2(x) + a3f3(x) + a4f4(x);
My first idea is to loop over the three terms and use switch statements
float y = 0;
for (int i = 0; i < 4; i++) {
if (a[i] != 0) {
switch (i) {
case 0: y += a[i] * f1(x); break;
case 1: y += a[i] * f2(x); break;
case 2: y += a[i] * f3(x); break;
case 3: y += a[i] * f4(x); break;
}
}
}
The above solution seems to be very basic and slow when number of terms in equation can be greater than 30. Now, I am thinking of creating an array of functions to remove switch statement.
float y = 0;
for (int i = 0; i < 4; i++) {
if (a[i] != 0) {
y += a[i] * f[i](x);
}
}```
still, I feel it can be better optimized. Let me know if you guys have a better approach to the above problem.
Thanks!
howl is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.