For large sparse matrices in MATLAB, compute the cumulative sum across the columns for non-zero entries?

StackOverflow https://stackoverflow.com/questions/18780953

Вопрос

In MATLAB have a large matrix with transition probabilities transition_probs, and an adjacency matrix adj_mat. I want to compute the cumulative sum of the transition matrix along the columns and then element wise multiply it against the adjacency matrix which acts as a mask in this way:

cumsumTransitionMat = cumsum(transition_probs,2) .* adj_mat;

I get a MEMORY error because with the cumsum all the entries of the matrix are then non-zero.

I would like to avoid this problem by only having the cumulative sum entries where there are non zero entries in the first place. How can this be done without the use of a for loop?

Это было полезно?

Решение

when CUMSUM is applied on rows, for each row it will go and fill with values starting with the first nonzero column it finds up until the last column, thats what it does by definition.

The worst case in terms of storage is when the sparse matrix contains values at the first column, the best case is when all nonzero values occur at the last column. Example:

% worst case
>> M = sparse([ones(5,1) zeros(5,4)]);
>> MM = cumsum(M,2);       % completely dense matrix
>> nnz(MM)
ans =
    25

% best case
>> MM = cumsum(fliplr(M),2);

If the resulting matrix does not fit in memory, I dont see what else you can do, except maybe use a for-loop over the rows, and process the matrix is smaller batches...

Note that you cannot apply the masking operation before computing the cumulative sum, since this will alter the results. So you cant say cumsum(transition_probs .* adj_mat, 2).

Другие советы

You can apply cumsum on the non-zero elements only. Here is some code:

A = sparse(round(rand(100,1)));        %some sparse data
A_cum = A;                             %instantiate A_cum by copy A

idx_A = find(A);                       %find non-zeros
A_cum(idx_A) = cumsum(A(idx_A)); %cumsum on non-zeros elements only 

You can check the output with

 B = cumsum(A);


   A_cum   B
     1     1
     0     1
     0     1
     2     2
     3     3
     4     4
     5     5
     0     5
     0     5
     6     6

and isequal(A_cum(find(A_cum)), B(find(A_cum))) gives 1.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top