Question

In classical optimization literature numerical derivation of functions is often mentioned to be a computationally expensive step. For example Quasi-Newton methods are presented as a method to avoid the computation of first and/or second derivatives when these are "too expensive" to compute.

What are the state of the art approaches to computing derivatives, and what is their time complexity? If this is heavily problem-dependent, I am particularly interested in the computation of first and second order derivatives for Nonlinear Least Squares problems, specifically the part concerning first order derivatives (Jacobians).

Was it helpful?

Solution

The time to compute first-order derivatives (gradients) and second-order derivatives (Hessians) depends heavily on the particular function.

In general, I am aware of three approaches:

  1. Analytical: With pencil and paper, figure out an analytical expression for the derivatives, by using the rules of calculus. Then implement those expressions. Here the running time depends entirely on how easy or hard it is to compute those expressions. In some cases, you may be able to use a computer algebra system to help with this calculation.

  2. Automatic differentiation: Use the computer to do compute the derivatives for you, given a program to compute the function itself. See, e.g., https://en.wikipedia.org/wiki/Automatic_differentiation. This will construct a program that evaluates the derivative at a value of your choice. The running time and ability to do this depends entirely on the program you are differentiating. Generally speaking, if you are differentiating a function $f$ that can be computed in $O(n)$ time as a straight-line expression (with loop-free code that uses only elementary operations and conditionals, but no arrays or random-access memory lookups or loops), then automatic differentiation can construct a program that evaluates the derivative at an arbitrary input in $O(n)$ time. This is true even if the function is over multiple variables and you want to compute the gradient. The Hessian is slower: if $f$ is a function of one variable, you can evaluate the Hessian at an arbitrary point in $O(n)$ time, but if $f$ is a function of $m$ variables, it will take $O(mn)$ time.

  3. Numerical differentiation: You can use the method of finite differences to approximate the gradient, given only a black box that can compute the function (you don't even need the code of that black box). See https://en.wikipedia.org/wiki/Numerical_differentiation. Here if you have a function $f:\mathbb{R} \to \mathbb{R}$ of one variable that can be evaluated in $O(n)$ time using some black box, you can approximate the derivative or second derivative in $O(n)$ time. If $f$ is a function of $m$ variables, you can approximate the gradient at an arbitrary point in $O(mn)$ time, and estimate the Hessian in $O(m^2n)$ time.

Finally, one more method that may be acceptible in some settings is to evaluate the function $f$ at several points, fit a model $\hat{f}$ based on those points, compute or estimate the first- or second-order derivatives of $\hat{f}$ using any of above methods, and use that as an estimate of the corresponding derivative of $f$.

Saying you are interested in "nonlinear least squares" does not narrow things down, because "nonlinear" allows a completely arbitrary function, so long as it is not linear.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top