Question

I've got a scipy sparse matrix (csr:Compressed Sparse Row matrix). I'd like to use Orange's feature selection methods (Orange.feature.scoring.score_all (InfoGain/MDL)). However, from my understanding I'll have to create a Table which only accepts a numpy array as an arguments. Therefore, whenever I tried to convert the csr matrix to an array, using (.toarray()), I get the following error (because the size of the matrix):

Traceback (most recent call last):
  File "C:\Users\NMS\Desktop\PyExp\experiments_acl2013.py", line 249, in <module>
    print(X_train.toarray())
  File "C:\Python27\lib\site-packages\scipy\sparse\compressed.py", line 561, in toarray
    return self.tocoo(copy=False).toarray(order=order, out=out)
  File "C:\Python27\lib\site-packages\scipy\sparse\coo.py", line 238, in toarray
    B = self._process_toarray_args(order, out)
  File "C:\Python27\lib\site-packages\scipy\sparse\base.py", line 635, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
ValueError: array is too big.

Is there another approach that can allow me to pass a sparse matrix to create a table? OR Is there a way to apply InfoGain or MDL, in Orange, without creating a table using my sparse matrix directly?

when passing memmap to Table I get the following error:

>>> t2 = Table(d2, mm)

Traceback (most recent call last):
   File "<pyshell#125>", line 1, in <module>
    t2 = Table(d2, mm)
   TypeError: invalid arguments

When passing the memmap with out the domain I get the following:

>>> mm
memmap([[0, 1, 2, 4],
       [9, 8, 6, 3]])
>>> t2 = Table(mm)

Traceback (most recent call last):
  File "<pyshell#128>", line 1, in <module>
    t2 = Table(mm)
TypeError: invalid arguments for constructor (domain or examples or both expected)
Était-ce utile?

La solution

Here it goes a workaround. For a given coo_matrix called m (obtained with m.tocoo()):

1) create a numpy.memmap array for writing:

mm = np.memmap('test.memmap', mode='w+', dtype=m.dtype, shape=m.shape)

2) copy the data to the memmap array, which should work:

for i,j,v in zip(m.row, m.col, m.data):
    mm[i,j] = v

3) You can access the memmap as detailed in the documentation...

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top