Vectorising an equation using numpy

Question 1

A sum of products? sounds like a job for np.einsum:

import numpy as np
N = 150
K = 3
M = 4
x = np.random.random((N,M))
mu = np.random.random((K,M))
gamma = np.random.random((N,K))

xbar = x-mu[:,None,:] # shape (3, 150, 4)
sigma = np.einsum('nk,knm,kno->kmo', gamma, xbar, xbar)
sigma /= gamma.sum(axis=0)[:,None,None]

Decoding 'nk,knm,kno->kmo':

This subscript specification has three components to the left of the array (->) followed by one component to the right.

The three components on the left correspond with the subscripts for gamma, xbar and xbar, the operands being passed to np.einsum.

gamma has subscript nk, just as in the formula you posted. xbar has shape (3, 150, 4). You can think of it as having subscript knm, where k and n have the same meaning as in the formula you posted, and m is a subscript representing the axis of length 4 which is not explicitly mentioned in your formula, but is apparently there given your description of the shape of the arrays.

Now the third subscript component is kno. The o subscript is used because the o plays the same role as the m subscript, but we don't want summation over m. In fact, we want the m and o subscripts to be iterated over independently, not in lock-step. Hence we give the third subscript different letters.

Notice that n appears in the subscripts on the left (nk, knm, kno), but does not appear on the right (in kmo). That tells np.einsum to sum over n.

The k appears in the subscripts on the left and on the right. This tells np.einsum that we wish to advance the k subscript in lock-step, but (since it appears on the right) we don't want to sum over k.

Since kmo appears on the right, these subscripts remain in the result. This results in sigma having shape (K,M,M) (i.e. (3,4,4)).

Question 2

Just a couple of performance pointers on top of unubtu's great answer.

np.einsum has trouble optimizing calls with more than two arguments. Whenever possible, it is generally faster to manually split the calculations into groups of two arguments, e.g.:

def unubtu():
    xbar = x-mu[:,None,:] # shape (3, 150, 4)
    sigma = np.einsum('nk,knm,kno->kmo', gamma, xbar, xbar)
    sigma /= gamma.sum(axis=0)[:,None,None]
    return sigma

def faster():
    xbar = x-mu[:,None,:] # shape (3, 150, 4)
    sigma = np.einsum('knm,kno->kmo', gamma.T[..., None] * xbar, xbar)
    sigma /= gamma.sum(axis=0)[:,None,None]
    return sigma

In [50]: %timeit unubtu()
10000 loops, best of 3: 147 µs per loop

In [51]: %timeit faster()
10000 loops, best of 3: 129 µs per loop

A 12% improvement isn't much, but the difference can get (much) larger with larger arrays.

Also, even though np.einsum is a great tool, that makes very difficult things easy, it is by no means as optimized as np.dot is if your numpy is built with a good linear algebra library. In your case, given that K is small, using np.dot and a Python loop is even faster:

def even_faster():
    sigma = np.empty((K, M, M))
    for k in xrange(K):
        x_ = x - mu[k]
        sigma[k] = np.dot((x_ * gamma[:, k, None]).T, x_)
    sigma /= gamma.sum(axis=0)[:,None,None]
    return sigma

In [52]: %timeit even_faster()
10000 loops, best of 3: 101 µs per loop