For large sparse matrices in MATLAB, compute the cumulative sum across the columns for non-zero entries?

Question 1

when CUMSUM is applied on rows, for each row it will go and fill with values starting with the first nonzero column it finds up until the last column, thats what it does by definition.

The worst case in terms of storage is when the sparse matrix contains values at the first column, the best case is when all nonzero values occur at the last column. Example:

% worst case
>> M = sparse([ones(5,1) zeros(5,4)]);
>> MM = cumsum(M,2);       % completely dense matrix
>> nnz(MM)
ans =
    25

% best case
>> MM = cumsum(fliplr(M),2);

If the resulting matrix does not fit in memory, I dont see what else you can do, except maybe use a for-loop over the rows, and process the matrix is smaller batches...

Note that you cannot apply the masking operation before computing the cumulative sum, since this will alter the results. So you cant say cumsum(transition_probs .* adj_mat, 2).

Question 2

You can apply cumsum on the non-zero elements only. Here is some code:

A = sparse(round(rand(100,1)));        %some sparse data
A_cum = A;                             %instantiate A_cum by copy A

idx_A = find(A);                       %find non-zeros
A_cum(idx_A) = cumsum(A(idx_A)); %cumsum on non-zeros elements only

You can check the output with

 B = cumsum(A);


   A_cum   B
     1     1
     0     1
     0     1
     2     2
     3     3
     4     4
     5     5
     0     5
     0     5
     6     6

and isequal(A_cum(find(A_cum)), B(find(A_cum))) gives 1.