For the purposes of optimizing the computation, it helps to apply the scaling of 1/2 directly to the filter kernels
filt1 = filt1/2;
Otherwise, if done afterward, N^2 additional multiplications have to be done to the NxN image pixels, instead of just 9 multiplications to a 3x3 kernel.
Beyond that, I agree with McMa. Your computations don't look anything like a differentiation. In fact, you already apply gradient()
in the very first line, so I don't understand what more you need.