Originally Posted by

**nburns**
Code:

for(i=0; i<n; i++) {
y = (y * x) + c[i];
}

Each iteration needs the value of y from the previous iteration. I don't think you can parallelize that. But if you rewrote it to compute the terms separately and add, there are opportunities for SIMD there, because the terms don't depend on each other.

Like this:

Code:

for (i=0; i<n; i+=4)
y = y*x*x*x*x + (c[i]*x*x*x + c[i+1]*x*x) + (c[i+2]*x + c[i+3]);

where the powers of x (modulo the word size) are constants computed in advance. The 4 multiplications are done in parallel. Then 2 additions are done in parallel in the next step, then finally 2 sequential additions. Of course you can unroll further.