I am trying to learn Expectation Maximization for parameter estimation in Mixture of Gaussians (1D). However, it seems the algorithm rarely finds the right parameters. I am wondering if I am doing something wrong.

The data is generated by three gaussians at 3 different locations(x=-10, x=5, and x=10):

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

# dataset is generated with 3 gaussians at with mean at -10, 10 and 5. 
x1 = 1.0 * np.random.randn(10000) - 10.0
x2 = 1.0 * np.random.randn(10000) + 10.0
x3 = 1.0 * np.random.randn(10000) + 5.0
x = np.hstack([x1,x2,x3]) # final data set

I checked the histogram and x is correct. Parameter learning is done with EM updates:

# model and initialization
M = 3 # number of mixtures
alpha = np.ones(M)*.5 # -> likelihood of the mixture
mu = np.random.random(M)*10 # -> mean of the gaussian
sigma = np.ones(M)*1.0 # -> std of the gaussian

w_mt = np.zeros((M,len(x))) # -> q(mixture | data, parameter)

# EM
for i in range(100):
    print "alpha:", alpha, "mu:", mu, "sigma:", sigma

    # E-step
    for m in range(M):
        w_mt[m] = alpha[m] * mlab.normpdf(x,mu[m],sigma[m])
    C = np.sum(w_mt, axis=0) # normalization
    w_mt = w_mt / C

    # M-step
    alpha = np.sum(w_mt,axis=1) / len(x)
    mu = np.sum(w_mt*x,axis=1)/np.sum(w_mt,axis=1)
    sigma = np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1)

    sigma[sigma < 0.1] = 0.1 # avoid numerical problems

I would expect the algorithm to (at least sometimes) find the correct mu(i.e -10,5,10) with std ~= 1.0. However, it seems the algorithm is never able to do that. Any help is appreciated

UPDATE:

Ted Kim's fix seems to have fixed the issue. I forgot to take the square root of when calculating std. If anyone is interested, here is the link with the updated code: link

有帮助吗?

解决方案

sigma is the standard deviation, but the sigma in your code is the variation (which is sigma ** 2).

try

sigma = np.sqrt(np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1))

其他提示

This could also be generalized to higher dimensions by considering the mean mu as vector and Sigma as covariance matrix instead of variance.

For example, the covariance matrix Sigma_k for the k th Gaussian can be computed in the M-step of EM with MLE with something like the following, where phi[i,k] represents the probability that the ith datapoint belongs to k th cluster, computed in the E-step of EM algorithm earlier.

Sigma[k,:,:] = sum([phi[i,k]*np.outer(X[i,:]-mu[k,:], X[i,:]-mu[k,:]) for i in range(n)]) / n_k[k].

The below figure shows how the GMM-EM soft-clustering with k=3 cluster centers works:

enter image description here

For more details, we could refer to this blog post.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top