What is the role of Numerical Gradient Computation in Backpropagation algorithm?

https://cs.stackexchange.com/questions/86482

05-11-2019
|

Question

I was listening CS231n (2017) lectures and noted that there is a lot of attention to Numerical Gradient Computation (NGC). It starts @5:53 in this video and appears a few times later.

Also, looking at the batch normalization materials (example), I found a lot of attention drawn to exactly the same topic (well, probably because it is the same backpropagation...).

As I understood, gradients we use in various optimization methods (vanilla SGD, Adam) require us to know activation function derivative. I suppose, if the activation function is complex or we are lazy enough to take derivative analytically, we need to compute gradient numerically and that is where we use NGC.

Questions:

Is that the only purpose of NGC in backpropagation?
Isn't it faster to use analytically form of the activation function derivative to calculate gradients?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange