Understanding truncation and rounding error in IEEE floating point system?

https://cs.stackexchange.com/questions/91815

05-11-2019
|

Question

I'm trying to understand the theory behind finding the optimal $h$ value for differentiation in this definition:

$$ \frac{f(x+h) - f(x)}{h}$$

as $h$ tends to 0.

Here is my understanding:

Truncation error comes from say approximating the value of $f(x+h)$ using the taylor series. In this case, it could be that:

$$ f(x+h) \approx f(x) + hf'(x) + \frac{h^2}{2!}f''(x) + O(h^3)$$

Letting $f'(x)$ to be the subject since we want to find gradient $f'(x)$, we have

$$f'(x) = \frac{f(x) - (hf'(x) + \frac{h^2}{2!}f''(x) + O(h^3))}{h}$$ $$\implies f'(x) = \frac{f(x+h)-f(x)}{h} - \frac{h}{2}f''(x) + O(h^2)$$

Intuitively, we can stop at here by letting $h \approx \sqrt\epsilon$ where $\epsilon$ is the machine epsilon, we can thus limit our truncation error. But suppose we continue, then our truncation error is $\frac{h}{2}f''(x)$.

Then rounding error comes from the interaction of the sums in the differentiation equation:

$$ \frac{f(x+h) - f(x)}{h} = \frac{[f(x) + hf'(x) + O(h^2)] - f(x)}{h}$$

But for $f'(x) \approx 1$ and since we have $h \approx \sqrt \epsilon$, we have

$$ \frac{f(x+h) - f(x)}{h} \approx 1 \pm \frac{\epsilon}{h}$$

Which means the rounding error is approximately $\frac{\epsilon}{h}$, which concurs with the fact that the left hand side of the equation is actually $f'(x)$ as well, just that we have managed to reduce the rounding error to what is seen on the right hand side.

Now, we see that our truncation error varies linearly with $h$ while our rounding error varies inversely with $h$. To find the optimal $h$ value, we equate both errors:

$$\frac{\epsilon}{h} = \frac{h}{2}f''(x) \approx \frac{h}{2}$$ since we have $f''(x) \approx 1$.

Letting $h$ be the subject, we have $$h = \sqrt{2\epsilon}$$ which means the optimal $h$ is around $h = \sqrt{\epsilon}$ as expected.

Is my understanding correct? Does the final factor of 2 in the end matter?

Big problem in the above argument: in point 3, one already use the fact that $h = \sqrt \epsilon$. Wouldn't that already defeat the purpose of finding the optimal $h$?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange