Formats for Storing Sparse Matrices

https://softwareengineering.stackexchange.com/questions/333637

30-12-2020
|

Pergunta

For storing sparse matrices I know there are multiple formats that are common, like COO, CSR, DIA, ELL, HYB, etc. (Link to some storage formats), but I haven't found that any of these stores the number of nonzeros per row.

I am considering creating a sparse matrix storing format similar to CSR except that instead of row pointers, the number of nonzeros entries per row is stored. Having this information will allow me to speed up my algorithm. I'm wondering if it is a fair expectation for a dataset to number of nonzeros per row, rather than row pointers?

I haven't found any myself, but I am wondering if there any examples of this?

Basically I want to create a new sparse matrix storage format but I am wondering if there is anything wrong with this format?

Solução

This should work fine when traversing the CSR matrix forward row by row, as you can simply keep an index accumulating. You'll need an extra counter variable, which most processors should easily accommodate; it doesn't even add a memory reference to the inner loop.

Here is the inner loop of common sparse matrix by dense vector multiply:

for (i = 0; i < N; i++)
{  
    for (k = RowPtr[i]; k < RowPtr[i+1]; k++)
    {  
        result[i] += Val[k]*d[Col[k]];
    }  
}

This would change to:

k = Count[0];
for (i = 0; i < N; i++)
{ 
    for (count = Count[i+1]; --count >= 0; k++)
    {  
        result[i] += Val[k]*d[Col[k]];
    }  
}

I can think of two drawbacks:

If some algorithm required seriously random access you might want to build a temporary RowPtr array to use. (Of course, we try to avoid those algorithms because they're not cache friendly.)
As a non-standard format your ability to use libraries (like Eigen) may be limited, but again you might be able to convert when needed.

Outras dicas

CSR format:

The CSR format uses a row pointer in order to quickly locate an item. To find item [i,j] requires:

1 indexed access to get the row "pointer" (row offset indeed, in an array AI)
maximum n indexed accesses to the column index (in array AJ) to find the matching column
1 indexed access to get the item (in array A) if it's not zero.

If for any reason you're interested in the number of non zero values in row i, you just have one mathematical operation to do: A[i]-A[i-1] (or A[i+1]-A[i] if starting i at 1).

The difficulty is if you set new non zero values in the matrix: you have to increment all row "pointers" after the row you are inserting in, and you have to insert the column index and the value (which implies moving the items that come afterwards).

Your new format

You intend to do keep trace of the number of non zero items in each line, instead of the row "pointer". This is perfectly feasible: just interpret A differently.

The drawback will be that accessing an item in row i, would then require each time to sum all the A[n] for n between 0 and i-1. So you'll need to make a lot of additions. Every time you access an element. This will be far less efficient than CSR.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange