Question

I have a very large sparse matrix in Octave and I want to get the variance of each row. If I use std(A,1); it crashes because memory is exhausted. Why is this? The variance should be very easy to calculate for a sparse matrix, shouldn't it? How can I make this work?

Was it helpful?

Solution

If you want the standard deviation of just the nonzero entries in each column, then you can do:

[nrows, ncols] = size(A);

counts = sum(spones(A),1);

means = sum(A,1) ./ max(counts, 1);
[i,j,v] = find(A);
v = means(j);
placedmeans = sparse(i,j,v,nrows,ncols);

vars = sum((A - placedmeans).^2, 1) ./ max(counts, 1);

stds = sqrt(vars);

I can't imagine a situation where you would want to take the standard deviations of all the terms in each column of a sparse matrix (including zeros), but if so, you only need to count the number of zeros in each column and include them in the calculations:

[nrows,ncols] = size(A);

zerocounts = nrows - sum(spones(A),1);

means = sum(A,1) ./ nrows;
[i,j,v] = find(A);
v = means(j);
placedmeans = sparse(i,j,v,nrows,ncols);

vars = (sum((A - placedmeans).^2, 1) + zerocounts .* means.^2) ./ nrows;

stds = sqrt(vars);

Also, I don't know if you want to subtract one from the denominator of vars (counts and nrows respectively).

EDIT: corrected a bug which reconstructs the placedmeans matrix of the wrong size whenever A ends in a row or column of all zeros. Also, the first case now returns a mean/var/std of zero whenever a column is all zeros (whereas before it would have been NaN)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top