Accelerate the calculation of inv(X'X)Qinv(X'X) in Matlab?

Question 1

I believe you can save a lot of computation time by explicitly doing an LU factorization, [l, u, p, q] = lu(X'*X); and use those factors when doing the calculations. Also, since X are constant for about 100 models, pre-calculating X'*X will most likely save you some time.

Note that in your case, the most time demanding operation might very well be the sqrt-function.

% Constant for every 100 models or so:
M = X'*X;
[l, u, p, q] = lu(M);

% Now, I guess this should be quite a bit faster (I might have messed up the order):
nwse = sqrt(diag(N.*(q * ( u \ (l \ (p * Q))) * q * (u \ (l \ p)))));

The first two terms are commonly used:

l the lower triangular matrix

u the upper triangular matrix

Now, p and q are a bit more uncommon.

p is a row permutation matrix used to obtain numerical stability. There is not much performance gain in doing [l, u, p] = lu(M) compared to [l, u] = lu(M) for sparse matrices.

q however offers a significant performance gain. q is a column permutation matrix that is used to reduce the amount of fill when doing the factorization.

Note that the [l, u, p, q] = lu(M) is only a valid syntax for sparse matrices.

As for why using full pivoting as described above should be faster:

Try the following to see the purpose of the column permutation matrix q. It is easier to work with elements that are aligned around the diagonal.

S = sprand(100,100,0.01);
[l, u, p] = lu(S);
spy(l)
figure
spy(u)

Now, compare it with this:

[ll, uu, pp, qq] = lu(S);
spy(ll);
figure
spy(uu);

^{Unfortunately, I don't have MATLAB here right now, so I can't guarantee that I put all the arguments in the correct order, but I think it's correct.}

Question 2

Following the helpful answer of Robert_P and the comments of Parag, I found the following to be the fastest for my particular large-scale sparse data:

L=chol(X'*X,'lower'); L=full(L);     
invXtX = L'\(L\ speye(size(X,2))); 
nwse = sqrt(N.*sum(invXtX.*(Q*invXtX)));

The last line computes the diagonal efficiently, idea taken from here.

Accelerate the calculation of inv(X'*X)*Q*inv(X'*X) in Matlab?

Accelerate the calculation of inv(X'X)Qinv(X'X) in Matlab?