Question

I am trying to subsample a scipy sparse matrix as a numpy matrix like this to get every 10th row and every 10th column:

connections = sparse.csr_matrix((data,(node1_index,node2_index)),
                                shape=(dimensions,dimensions))
connections_sampled = np.zeros((dimensions/10, dimensions/10))
connections_sampled = connections[::10,::10]

However, when I run this and and query the shape of connections_sampled, I get the original dimensions of connections instead of dimensions that have been reduced by a factor of 10.

Does this type of subsampling now work with sparse matrices? It seems to work when I use smaller matrices, but I can't get this to give the correct answer.

Was it helpful?

Solution

You cannot sample every 10th row and column of a CSR matrix, not in Scipy 0.12 at least:

>>> import scipy.sparse as sps
>>> a = sps.rand(1000, 1000, format='csr')
>>> a[::10, ::10]
Traceback (most recent call last):
...    
ValueError: slicing with step != 1 not supported

You can do it, though, by converting first to a LIL format matrix:

>>> a.tolil()[::10, ::10]
<100x100 sparse matrix of type '<type 'numpy.float64'>'
    with 97 stored elements in LInked List format>

As you see, the shape is updated correctly. If you want a numpy array, not a sparse matrix, try:

>>> a.tolil()[::10, ::10].A
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top