Question

I am trying to learn Expectation Maximization for parameter estimation in Mixture of Gaussians (1D). However, it seems the algorithm rarely finds the right parameters. I am wondering if I am doing something wrong.

The data is generated by three gaussians at 3 different locations(x=-10, x=5, and x=10):

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

# dataset is generated with 3 gaussians at with mean at -10, 10 and 5. 
x1 = 1.0 * np.random.randn(10000) - 10.0
x2 = 1.0 * np.random.randn(10000) + 10.0
x3 = 1.0 * np.random.randn(10000) + 5.0
x = np.hstack([x1,x2,x3]) # final data set

I checked the histogram and x is correct. Parameter learning is done with EM updates:

# model and initialization
M = 3 # number of mixtures
alpha = np.ones(M)*.5 # -> likelihood of the mixture
mu = np.random.random(M)*10 # -> mean of the gaussian
sigma = np.ones(M)*1.0 # -> std of the gaussian

w_mt = np.zeros((M,len(x))) # -> q(mixture | data, parameter)

# EM
for i in range(100):
    print "alpha:", alpha, "mu:", mu, "sigma:", sigma

    # E-step
    for m in range(M):
        w_mt[m] = alpha[m] * mlab.normpdf(x,mu[m],sigma[m])
    C = np.sum(w_mt, axis=0) # normalization
    w_mt = w_mt / C

    # M-step
    alpha = np.sum(w_mt,axis=1) / len(x)
    mu = np.sum(w_mt*x,axis=1)/np.sum(w_mt,axis=1)
    sigma = np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1)

    sigma[sigma < 0.1] = 0.1 # avoid numerical problems

I would expect the algorithm to (at least sometimes) find the correct mu(i.e -10,5,10) with std ~= 1.0. However, it seems the algorithm is never able to do that. Any help is appreciated

UPDATE:

Ted Kim's fix seems to have fixed the issue. I forgot to take the square root of when calculating std. If anyone is interested, here is the link with the updated code: link

Was it helpful?

Solution

sigma is the standard deviation, but the sigma in your code is the variation (which is sigma ** 2).

try

sigma = np.sqrt(np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1))

OTHER TIPS

This could also be generalized to higher dimensions by considering the mean mu as vector and Sigma as covariance matrix instead of variance.

For example, the covariance matrix Sigma_k for the k th Gaussian can be computed in the M-step of EM with MLE with something like the following, where phi[i,k] represents the probability that the ith datapoint belongs to k th cluster, computed in the E-step of EM algorithm earlier.

Sigma[k,:,:] = sum([phi[i,k]*np.outer(X[i,:]-mu[k,:], X[i,:]-mu[k,:]) for i in range(n)]) / n_k[k].

The below figure shows how the GMM-EM soft-clustering with k=3 cluster centers works:

enter image description here

For more details, we could refer to this blog post.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top