first derivative by gradient of image by kernel

Question 1

For the purposes of optimizing the computation, it helps to apply the scaling of 1/2 directly to the filter kernels

filt1 = filt1/2;

Otherwise, if done afterward, N^2 additional multiplications have to be done to the NxN image pixels, instead of just 9 multiplications to a 3x3 kernel.

Beyond that, I agree with McMa. Your computations don't look anything like a differentiation. In fact, you already apply gradient() in the very first line, so I don't understand what more you need.

Question 2

since associativity (with scalars) is a quality of convolutions the order of the multiplication should not play any rolle.

On the other hand your filters don't seem to me like they perform a differentiation. The classical filter for the discrete differentiation would be a Sobel that looks something like this:

[1,0,-1
 2,0,-2
 1,0,-1]

and

[1,2,1
 0,0,0
-1,-2,-1]

Question 3

I'd be inclined to use the imgradient and imgradientxy functions in MATLAB. If you want directional gradients, use imgradientxy and if you want gradient magnitude and direction components, use imgradient.

You can choose to have derivatives computed using Sobel,Prewitt or Roberts gradient kernels or using central or intermediate differences.

Here's an example:

[Gx,Gy] = imgradientxy(im,'Sobel');

Instead if you want to continue using conv2, you can get gradient kernels using the fspecial function.

kernelx = fspecial('sobel');
kernely = kernelx';