sigma is the standard deviation, but the sigma in your code is the variation (which is sigma ** 2).
try
sigma = np.sqrt(np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1))
Question
I am trying to learn Expectation Maximization for parameter estimation in Mixture of Gaussians (1D). However, it seems the algorithm rarely finds the right parameters. I am wondering if I am doing something wrong.
The data is generated by three gaussians at 3 different locations(x=-10, x=5, and x=10
):
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
# dataset is generated with 3 gaussians at with mean at -10, 10 and 5.
x1 = 1.0 * np.random.randn(10000) - 10.0
x2 = 1.0 * np.random.randn(10000) + 10.0
x3 = 1.0 * np.random.randn(10000) + 5.0
x = np.hstack([x1,x2,x3]) # final data set
I checked the histogram and x is correct. Parameter learning is done with EM updates:
# model and initialization
M = 3 # number of mixtures
alpha = np.ones(M)*.5 # -> likelihood of the mixture
mu = np.random.random(M)*10 # -> mean of the gaussian
sigma = np.ones(M)*1.0 # -> std of the gaussian
w_mt = np.zeros((M,len(x))) # -> q(mixture | data, parameter)
# EM
for i in range(100):
print "alpha:", alpha, "mu:", mu, "sigma:", sigma
# E-step
for m in range(M):
w_mt[m] = alpha[m] * mlab.normpdf(x,mu[m],sigma[m])
C = np.sum(w_mt, axis=0) # normalization
w_mt = w_mt / C
# M-step
alpha = np.sum(w_mt,axis=1) / len(x)
mu = np.sum(w_mt*x,axis=1)/np.sum(w_mt,axis=1)
sigma = np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1)
sigma[sigma < 0.1] = 0.1 # avoid numerical problems
I would expect the algorithm to (at least sometimes) find the correct mu
(i.e -10,5,10) with std ~= 1.0. However, it seems the algorithm is never able to do that. Any help is appreciated
UPDATE:
Ted Kim's fix seems to have fixed the issue. I forgot to take the square root of when calculating std
. If anyone is interested, here is the link with the updated code: link
Solution
sigma is the standard deviation, but the sigma in your code is the variation (which is sigma ** 2).
try
sigma = np.sqrt(np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1))
OTHER TIPS
This could also be generalized to higher dimensions by considering the mean mu
as vector and Sigma
as covariance matrix instead of variance.
For example, the covariance matrix Sigma_k
for the k
th Gaussian can be computed in the M-step of EM with MLE with something like the following, where phi[i,k]
represents the probability that the i
th datapoint belongs to k
th cluster, computed in the E-step of EM algorithm earlier.
Sigma[k,:,:] = sum([phi[i,k]*np.outer(X[i,:]-mu[k,:], X[i,:]-mu[k,:]) for i in range(n)]) / n_k[k]
.
The below figure shows how the GMM-EM soft-clustering with k=3
cluster centers works:
For more details, we could refer to this blog post.