NumPy Matrix Multiplication Efficiency for Matrix With Known Structure

https://stackoverflow.com/questions/20461271

30-08-2022
|

سؤال

I have two NxN matrices that I want to multiply together: A and B. In NumPy, I used:

import numpy as np
C = np.dot(A, B)

However, I happen to know that for matrix B only row n and column n are non-zero (this comes directly from the analytical formula that produced the matrix and is without a doubt always the case).

Hoping to take advantage of this fact and reduce the number of multiplications needed to produce C, I replaced the above with:

import numpy as np
for row in range(0, N):
    for col in range(0, N):
        if col != n:
            C[row, col] = A[row, n]*B[n, col]    #Just one scalar multiplication
        else:
            C[row, col] = np.dot(A[row], B[:, n])

Analytically, this should reduce the total complexity as follows: In the general case (not using any fancy tricks, just basic matrix multiplication) C = AB, where A and B are both NxN, should be O(N^3). That is, all N rows must multiply all N columns, and each of these dot products contains N multiplications => O(NNN) = O(N^3).#

Exploiting the structure of B as I've done above however should go as O(N^2 + N^2) = O(2N^2) = O(N^2). That is, All N rows must multiply all N columns, however, for all of these (except those involving 'B[:, n]') only one scalar multiplication is required: only one element of 'B[:, m]' is non-zero for m != n. When n == m, which will occur N times (once for each row of A that must multiply column n of B), N scalar multiplications must occur.#

However, the first block of code (using np.dot(A, B)) is substantially faster. I'm aware (via information like: Why is matrix multiplication faster with numpy than with ctypes in Python?) that the low level implementation details of np.dot are likely to blame for this. So my question is this: How can I exploit the structure of matrix B to improve multiplication efficiency without sacrificing the implementation efficiency of NumPy, without building my own low level matrix multiplication in c?

This method is part of a numerical optimization over many variables, hence, O(N^3) is intractable whereas O(N^2) will likely get the job done.

Thank you for any help. Also, I'm new to SO, so please pardon any newbie errors.

المحلول

If I understood A and B correctly, then I do not understand the for loops and why you are not just multiplying by the two non-zero vectors:

# say A & B are like this:
n, N = 3, 5
A = np.array( np.random.randn(N, N ) )

B = np.zeros_like( A )
B[ n ] = np.random.randn( N )
B[:, n] = np.random.randn( N )

take the non-zero row and column of B:

rowb, colb = B[n,:], np.copy( B[:,n] )
colb[ n ] = 0

multiply A by those two vector:

X = np.outer( A[:,n], rowb )
X[:,n] += np.dot( A, colb )

to verify check:

X - np.dot( A, B )

with N=100:

%timeit np.dot(A, B)
1000 loops, best of 3: 1.39 ms per loop

%timeit colb = np.copy( B[:,n] ); colb[ n ] = 0; X = np.outer( A[:,n], B[n,:] ); X[:,n] += np.dot( A, colb )
10000 loops, best of 3: 98.5 µs per loop

نصائح أخرى

I timed it, and using sparse is faster:

import numpy as np
from scipy import sparse

from timeit import timeit

A = np.random.rand(100,100)
B = np.zeros(A.shape, dtype=np.float)

B[3] = np.random.rand(100)
B[:,3] = np.random.rand(100)

sparse_B = sparse.csr_matrix(B)

n = 1000

t1 = timeit('np.dot(A, B)', 'from __main__ import np, A, B', number=n)
print 'dense way : {}'.format(t1)
t2 = timeit('A * sparse_B', 'from __main__ import A, sparse_B',number=n)
print 'sparse way : {}'.format(t2)

Result:

dense way : 1.15117192268
sparse way : 0.113152980804
>>> np.allclose(np.dot(A, B), A * sparse_B)
True

As number of rows of B increases, so should the time advantage of multiplication using sparse matrix.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow