Why derivative of a function is used to calculate Local Minimum instead of the actual function?

Question 1

The reason is that because the function is concave (or convex if you're doing maximisation---these problems are equivalent), you know that there is a single minimum (maximum). This means that there is a single point where the gradient is equal to zero. There are techniques that use the function itself, but if you can compute the gradient, you can converge much faster because you can think of the gradient giving you information about how far you are from the optimum.

As well as Gradient Descent, there's an optimisation method known as Newton's method, which requires computation of the second derivative (the Hessian in multi-variate optimisation). This converges even faster still, but requires you to be able to invert the Hessian which is not feasible if you have a lot of parameters. So there are methods to get around this which compute a limited memory approximation of the Hessian. These methods converge faster still because they're using information about the curvature of the gradient: it's a simple tradeoff, where the more you know about the function you're trying to optimise, the faster you can find the solution.

Question 2

I'm not a mathematician - so I can't give you exact answer, however, you need to understand what derivation does, e.g.:

http://en.wikipedia.org/wiki/Derivative http://en.wikipedia.org/wiki/Differential_of_a_function

this is what you need(what differentiation do): http://en.wikipedia.org/wiki/File:Graph_of_sliding_derivative_line.gif

The derivative at a point equals the slope of the tangent line to the graph of the function at that point. And this is exactly what you want when you are looking a descent. Take it as very informal point of view, wikipedia articles will give you much deeper and precise knowledge ...