Python Numpy matrix multiplication in high dimension

Question 1

This is a typical application for np.tensordot():

sum = np.tensordot(A, B, [[0,2],[0,1]])

Timing

Using the following code:

import numpy as np

n_examples = 100
A = np.random.randn(n_examples, 20,30)
B = np.random.randn(n_examples, 30,5)

def sol1():
    sum = np.zeros([20,5])
    for i in range(len(A)):
      sum += np.dot(A[i],B[i])
    return sum

def sol2():
    return np.array(map(np.dot, A,B)).sum(0)

def sol3():
    return np.einsum('nmk,nkj->mj',A,B)

def sol4():
    return np.tensordot(A, B, [[2,0],[1,0]])

def sol5():
    return np.tensordot(A, B, [[0,2],[0,1]])

Results:

timeit sol1()
1000 loops, best of 3: 1.46 ms per loop

timeit sol2()
100 loops, best of 3: 4.22 ms per loop

timeit sol3()
1000 loops, best of 3: 1.87 ms per loop

timeit sol4()
10000 loops, best of 3: 205 µs per loop

timeit sol5()
10000 loops, best of 3: 172 µs per loop

on my computer the tensordot() was the fastest solution and changing the order that the axes are evaluated did not change the results neither the performance.

Question 2

Ha, it can be done in just one line: np.einsum('nmk,nkj->mj',A,B).

See Einstein summation: http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html

Not the same problem but the idea is quite much the same, see discussions and alternative methods in this topic we just discussed: numpy multiply matrices preserve third axis

Don't name your variable sum, you override the build-in sum.

As @Jaime pointed out, the loop is actually faster for dimensions of these size. In fact a solution based on map and sum is, albeit simpler, even slower:

In [19]:

%%timeit
SUM = np.zeros([20,5])
for i in range(len(A)):
  SUM += np.dot(A[i],B[i])
10000 loops, best of 3: 115 µs per loop
In [20]:

%timeit np.array(map(np.dot, A,B)).sum(0)
1000 loops, best of 3: 445 µs per loop
In [21]:

%timeit np.einsum('nmk,nkj->mj',A,B)
1000 loops, best of 3: 259 µs per loop

Thing are different with larger dimension:

n_examples = 1000
A = np.random.randn(n_examples, 20,1000)
B = np.random.randn(n_examples, 1000,5)

And:

In [46]:

%%timeit
SUM = np.zeros([20,5])
for i in range(len(A)):
  SUM += np.dot(A[i],B[i])
1 loops, best of 3: 191 ms per loop
In [47]:

%timeit np.array(map(np.dot, A,B)).sum(0)
1 loops, best of 3: 164 ms per loop
In [48]:

%timeit np.einsum('nmk,nkj->mj',A,B)
1 loops, best of 3: 451 ms per loop