Rounding error reduction?

Question 1

Sadly, all of the floating point types are notorious for rounding error. They can't even store 0.1 without it (you can prove this using long division by hand: the binary equivalent is 0b0.0001100110011001100...). You might try some workarounds like expanding that pow to a hard-coded multiplication, but you'll ultimately need to code your program to anticipate and minimize the effects of rounding error. Here are a couple ideas:

Never compare floating point values for equality. Some alternative comparisons I have seen include: abs(a-b) < delta, or percent_difference (a,b) < delta or even abs(a/b-1) < delta, where delta is a "suitably small" value you have determined works for this specific test.
Avoid adding long arrays of numbers into an accumulator; the end of the array may be completely lost to rounding error as the accumulator grows large. In "Cuda by Example" by Jason Sanders and Edward Kandrot, the authors recommend recursively adding each pair of elements individually so that each step produces an array half the size of the previous step, until you get a one-element array.

Question 2

In a(), you lose precision when you add a0 (which is exactly 1) to the small and imprecise result of sqrt()*dx.

The function b() doesn't lose any precision using the supplied values.

When you call a() before b() as in the second output, you're doing mathematical operations on a number that's already imprecise, compounding the error.

Try to structure the mathematical operations so you do operations that are less likely to create floating point errors first and those more likely to create floating point errors last.

Or, inside your functions, make sure they are operating on "long double" values. For example, the following uses floating-point promotion to promote double to long double during the first mathematical operation (pay attention to operator precedence):

template <typename Type>
inline Type a(const Type dx, const Type a0, const Type z0, const Type b1)
{
    return (std::sqrt(std::abs(2*static_cast<long double>(b1)-z0))*dx)+a0;
}