sigma is the standard deviation, but the sigma in your code is the variation (which is sigma ** 2).
try
sigma = np.sqrt(np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1))
题
I am trying to learn Expectation Maximization for parameter estimation in Mixture of Gaussians (1D). However, it seems the algorithm rarely finds the right parameters. I am wondering if I am doing something wrong.
The data is generated by three gaussians at 3 different locations(x=-10, x=5, and x=10
):
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
# dataset is generated with 3 gaussians at with mean at -10, 10 and 5.
x1 = 1.0 * np.random.randn(10000) - 10.0
x2 = 1.0 * np.random.randn(10000) + 10.0
x3 = 1.0 * np.random.randn(10000) + 5.0
x = np.hstack([x1,x2,x3]) # final data set
I checked the histogram and x is correct. Parameter learning is done with EM updates:
# model and initialization
M = 3 # number of mixtures
alpha = np.ones(M)*.5 # -> likelihood of the mixture
mu = np.random.random(M)*10 # -> mean of the gaussian
sigma = np.ones(M)*1.0 # -> std of the gaussian
w_mt = np.zeros((M,len(x))) # -> q(mixture | data, parameter)
# EM
for i in range(100):
print "alpha:", alpha, "mu:", mu, "sigma:", sigma
# E-step
for m in range(M):
w_mt[m] = alpha[m] * mlab.normpdf(x,mu[m],sigma[m])
C = np.sum(w_mt, axis=0) # normalization
w_mt = w_mt / C
# M-step
alpha = np.sum(w_mt,axis=1) / len(x)
mu = np.sum(w_mt*x,axis=1)/np.sum(w_mt,axis=1)
sigma = np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1)
sigma[sigma < 0.1] = 0.1 # avoid numerical problems
I would expect the algorithm to (at least sometimes) find the correct mu
(i.e -10,5,10) with std ~= 1.0. However, it seems the algorithm is never able to do that. Any help is appreciated
UPDATE:
Ted Kim's fix seems to have fixed the issue. I forgot to take the square root of when calculating std
. If anyone is interested, here is the link with the updated code: link
解决方案
sigma is the standard deviation, but the sigma in your code is the variation (which is sigma ** 2).
try
sigma = np.sqrt(np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1))
其他提示
This could also be generalized to higher dimensions by considering the mean mu
as vector and Sigma
as covariance matrix instead of variance.
For example, the covariance matrix Sigma_k
for the k
th Gaussian can be computed in the M-step of EM with MLE with something like the following, where phi[i,k]
represents the probability that the i
th datapoint belongs to k
th cluster, computed in the E-step of EM algorithm earlier.
Sigma[k,:,:] = sum([phi[i,k]*np.outer(X[i,:]-mu[k,:], X[i,:]-mu[k,:]) for i in range(n)]) / n_k[k]
.
The below figure shows how the GMM-EM soft-clustering with k=3
cluster centers works:
For more details, we could refer to this blog post.