scipy.linalg.norm different from sklearn.preprocessing.normalize?

https://stackoverflow.com/questions/20392328

29-08-2022
|

Question

from numpy.random import rand
from sklearn.preprocessing import normalize
from scipy.sparse import csr_matrix
from scipy.linalg import norm

w = (rand(1,10)<0.25)*rand(1,10)
x = (rand(1,10)<0.25)*rand(1,10)
w_csr = csr_matrix(w)
x_csr = csr_matrix(x)
(normalize(w_csr,axis=1,copy=False,norm='l2')*normalize(x_csr,axis=1,copy=False,norm='l2')).todense()

norm(w,ord='fro')*norm(x,ord='fro')

I am working with scipy csr_matrix and would like to normalize two matrices using the frobenius norm and get their product. But norm from scipy.linalg and normalize from sklearn.preprocessing seem to evaluate the matrices differently. Since technically in the above two cases I am calculating the same frobenius norm shouldn't the two expressions evaluate to the same thing? But I get the following answer:

matrix([[ 0.962341]])

0.4431811178371029

for sklearn.preprocessing and scipy.linalg.norm respectively. I am really interested to know what I am doing wrong.

Solution

sklearn.prepocessing.normalize divides each row by its norm. It returns a matrix with the same shape as its input. scipy.linalg.norm returns the norm of the matrix. So your calculations are not equivalent.

Note that your code is not correct as it is written. This line

(normalize(w_csr,axis=1,copy=False,norm='l2')*normalize(x_csr,axis=1,copy=False,norm='l2')).todense()

raises ValueError: dimension mismatch. The two calls to normalize both return matrices with shapes (1, 10), so their dimensions are not compatible for a matrix product. What did you do to get matrix([[ 0.962341]])?

Here's a simple function to compute the Frobenius norm of a sparse (e.g. CSR or CSC) matrix:

def spnorm(a):
    return np.sqrt(((a.data**2).sum()))

For example,

In [182]: b_csr
Out[182]: 
<3x5 sparse matrix of type '<type 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>

In [183]: b_csr.A
Out[183]: 
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  2.,  0.,  4.,  0.],
       [ 0.,  0.,  0.,  2.,  1.]])

In [184]: spnorm(b_csr)
Out[184]: 5.0990195135927845

In [185]: norm(b_csr.A)
Out[185]: 5.0990195135927845

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow