Implementing automatic differentiation for 2nd derivative: algorithm for traversing the computational graph?

StackOverflow https://stackoverflow.com/questions/3173359

  •  02-10-2019
  •  | 
  •  

Question

I am attempting to implement automatic differentiation for a Python statistics package (the problem formulation is similar to optimization problem formulations).

The computational graph is generated using operator overloading and factory functions for operations like sum(), exp(), etc. I have implemented automatic differentiation for the gradient using reverse accumulation. However, I have found implementing automatic differentiation for the second derivative (the Hessian) much more difficult. I know how to do the individual 2nd partial gradient calculations, but I have had trouble coming up with an intelligent way to traverse the graph and do the accumulations. Does anyone know of good articles that give algorithms for automatic differentiation for the second derivative or open source libraries that implement the same that I may try to learn from?

Was it helpful?

Solution

First you must decide if you want o calculate a sparse Hessian or something closer to a fully dense Hessian.

If sparse is what you want, there are currently two competitive ways of doing this. Only using the computational graph in a clever way, one reverse sweep of the computational graph you can calculate the Hessian matrix using the edge_pushing algorithm:

http://www.tandfonline.com/doi/full/10.1080/10556788.2011.580098

Or you can try graph colouring techniques to compact your Hessian matrix into a matrix of fewer columns, then use reverse accumulation to calculate each column

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.66.2603

If what you want is a dense Hessian (unusual in practice) then your probably better of calculating one column of the Hessian at a time using reverse accumulation (search for BRUCE CHRISTIANSON and reverse accumulation)

OTHER TIPS

The usual method for approximating the Hessian in 3 dimensions is the BFGS

The L-BFGS method is similar.

Here you can find the source code for the L-BFGS (which calculates the Hessian as an intermediate result for solving ODEs) in several languages (C#, C++, VBA, etc.) although not in python. I think it is not easy to translate.

If you are going to translate the alg from another language, pay particular attention to numerical errors and do sensitivity analysis (you'll need to calculate the inverse of the Hessian matrix)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top