Question

I am trying to look for a matrix operation in numpy that would speed up the following calculation.

I have two 3D matrices A and B. the first dimension indicates the example, and both of them have n_examples examples. What I want to achieve is to dot product each example in A and B and sum the result:

import numpy as np

n_examples = 10
A = np.random.randn(n_examples, 20,30)
B = np.random.randn(n_examples, 30,5)
sum = np.zeros([20,5])
for i in range(len(A)):
  sum += np.dot(A[i],B[i])
Was it helpful?

Solution

This is a typical application for np.tensordot():

sum = np.tensordot(A, B, [[0,2],[0,1]])

Timing

Using the following code:

import numpy as np

n_examples = 100
A = np.random.randn(n_examples, 20,30)
B = np.random.randn(n_examples, 30,5)

def sol1():
    sum = np.zeros([20,5])
    for i in range(len(A)):
      sum += np.dot(A[i],B[i])
    return sum

def sol2():
    return np.array(map(np.dot, A,B)).sum(0)

def sol3():
    return np.einsum('nmk,nkj->mj',A,B)

def sol4():
    return np.tensordot(A, B, [[2,0],[1,0]])

def sol5():
    return np.tensordot(A, B, [[0,2],[0,1]])

Results:

timeit sol1()
1000 loops, best of 3: 1.46 ms per loop

timeit sol2()
100 loops, best of 3: 4.22 ms per loop

timeit sol3()
1000 loops, best of 3: 1.87 ms per loop

timeit sol4()
10000 loops, best of 3: 205 µs per loop

timeit sol5()
10000 loops, best of 3: 172 µs per loop

on my computer the tensordot() was the fastest solution and changing the order that the axes are evaluated did not change the results neither the performance.

OTHER TIPS

Ha, it can be done in just one line: np.einsum('nmk,nkj->mj',A,B).

See Einstein summation: http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html

Not the same problem but the idea is quite much the same, see discussions and alternative methods in this topic we just discussed: numpy multiply matrices preserve third axis

Don't name your variable sum, you override the build-in sum.

As @Jaime pointed out, the loop is actually faster for dimensions of these size. In fact a solution based on map and sum is, albeit simpler, even slower:

In [19]:

%%timeit
SUM = np.zeros([20,5])
for i in range(len(A)):
  SUM += np.dot(A[i],B[i])
10000 loops, best of 3: 115 µs per loop
In [20]:

%timeit np.array(map(np.dot, A,B)).sum(0)
1000 loops, best of 3: 445 µs per loop
In [21]:

%timeit np.einsum('nmk,nkj->mj',A,B)
1000 loops, best of 3: 259 µs per loop

Thing are different with larger dimension:

n_examples = 1000
A = np.random.randn(n_examples, 20,1000)
B = np.random.randn(n_examples, 1000,5)

And:

In [46]:

%%timeit
SUM = np.zeros([20,5])
for i in range(len(A)):
  SUM += np.dot(A[i],B[i])
1 loops, best of 3: 191 ms per loop
In [47]:

%timeit np.array(map(np.dot, A,B)).sum(0)
1 loops, best of 3: 164 ms per loop
In [48]:

%timeit np.einsum('nmk,nkj->mj',A,B)
1 loops, best of 3: 451 ms per loop
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top