Sadly, all of the floating point types are notorious for rounding error. They can't even store 0.1 without it (you can prove this using long division by hand: the binary equivalent is 0b0.0001100110011001100...). You might try some workarounds like expanding that pow to a hard-coded multiplication, but you'll ultimately need to code your program to anticipate and minimize the effects of rounding error. Here are a couple ideas:
Never compare floating point values for equality. Some alternative comparisons I have seen include: abs(a-b) < delta, or percent_difference (a,b) < delta or even abs(a/b-1) < delta, where delta is a "suitably small" value you have determined works for this specific test.
Avoid adding long arrays of numbers into an accumulator; the end of the array may be completely lost to rounding error as the accumulator grows large. In "Cuda by Example" by Jason Sanders and Edward Kandrot, the authors recommend recursively adding each pair of elements individually so that each step produces an array half the size of the previous step, until you get a one-element array.