Question

I was listening CS231n (2017) lectures and noted that there is a lot of attention to Numerical Gradient Computation (NGC). It starts @5:53 in this video and appears a few times later.

Also, looking at the batch normalization materials (example), I found a lot of attention drawn to exactly the same topic (well, probably because it is the same backpropagation...).

As I understood, gradients we use in various optimization methods (vanilla SGD, Adam) require us to know activation function derivative. I suppose, if the activation function is complex or we are lazy enough to take derivative analytically, we need to compute gradient numerically and that is where we use NGC.

Questions:

  1. Is that the only purpose of NGC in backpropagation?

  2. Isn't it faster to use analytically form of the activation function derivative to calculate gradients?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top