Expectation Maximization Algorithm (EM) for Gaussian Mixture Models (GMMs)

https://datascience.stackexchange.com/questions/76778

12-12-2020
|

Question

I'm trying to apply the Expectation Maximization algorithm (EM) to a Gaussian Mixture Model (GMM) using Python and NumPy. The PDF document I am basing my implementation on can be found here. Below are the equations:

$\mathrm{E}-\text{step:}$

$$w_{ik} = \frac{\pi_k \cdot p_k(x_i|z_k, \mu_k, \Sigma_k)}{\sum_{m=1}^{K} \pi_m \cdot p_m(x_i|z_m, \mu_m, \Sigma_m)}, \; [1]$$

$\text{where:}$

$${\displaystyle (2\pi )^{-{1}}|{\Sigma_k}|^{-{\frac {1}{2}}}\,\mathrm e^{-{\frac {1}{2}}(x_i -{\mu_k})^{\!{\mathsf {T}}}{{\Sigma_k }}^{-1}(x_i -{\mu_k})}.} $$

$\mathrm{M}-\text{step:}$

$$\pi_k^{\text{new}} = \frac{N_k}{N}, \; [2]$$

$\text{where:}$

$$N_k = \sum_{i=1}^{N} w_{ik}.$$

$$\mu_k^{\text{new}} = \frac{1}{N_k} \sum_{i=1}^{N} w_{ik} \cdot x_i, \; [3]$$

$$\Sigma_k^{\text{new}} = \frac{1}{N_k} \sum_{i=1}^N w_{ik} (x_i - \mu_k)(x_i - \mu_k)^\mathsf {T}. \; [4]$$ When applying the algorithm I get the mean of the first and second cluster equal to:

array([[2.50832195],
       [2.51546208]])

When the actual vector means for the first and second cluster are, respectively:

array([[0],
       [0]])

and:

array([[5],
       [5]])

The same thing happens when getting the values of the covariance matrices I get:

array([[7.05168736, 6.17098629],
       [6.17098629, 7.23009494]])

When it should be:

array([[1, 0],
       [0, 1]])

for both clusters. Here is the code:

np.random.seed(1)

# first cluster
X_11 = np.random.normal(0, 1, 1000)
X_21 = np.random.normal(0, 1, 1000)

# second cluster
X_12 = np.random.normal(5, 1, 1000)
X_22 = np.random.normal(5, 1, 1000)

X_1 = np.concatenate((X_11,X_12), axis=None)
X_2 = np.concatenate((X_21,X_22), axis=None)

# data matrix of k x n dimensions (2 x 2000 dimensions)
X = np.concatenate((np.array([X_1]),np.array([X_2])), axis=0)

# multivariate normal distribution function gives n x 1 vector (2000 x 1 vector)
def normal_distribution(x, mu, sigma):
  mvnd = []
  for i in range(np.shape(x)[1]):
    gd = (2*np.pi)**(-2/2) * np.linalg.det(sigma)**(-1/2) * np.exp((-1/2) * np.dot(np.dot((x[:,i:i+1]-mu).T, np.linalg.inv(sigma)), (x[:,i:i+1]-mu)))
    mvnd.append(gd)
  return np.reshape(np.array(mvnd), (np.shape(x)[1], 1))

# Initialized parameters
sigma_1 = np.array([[10, 0],
                    [0, 10]])
sigma_2 = np.array([[10, 0],
                    [0, 10]])
mu_1 = np.array([[10], 
                 [10]])
mu_2 = np.array([[10], 
                 [10]])
pi_1 = 0.5
pi_2 = 0.5

Sigma_1 = np.empty([2000, 2, 2])
Sigma_2 = np.empty([2000, 2, 2])

for i in range(10):
  # E-step:
  w_i1 = (pi_1*normal_distribution(X, mu_1, sigma_1))/(pi_1*normal_distribution(X, mu_1, sigma_1) + pi_2*normal_distribution(X, mu_2, sigma_2))
  w_i2 = (pi_2*normal_distribution(X, mu_2, sigma_2))/(pi_1*normal_distribution(X, mu_1, sigma_1) + pi_2*normal_distribution(X, mu_2, sigma_2))
  # M-step:
  pi_1 = np.sum(w_i1)/2000
  pi_2 = np.sum(w_i2)/2000
  mu_1 = np.array([(1/(np.sum(w_i1)))*np.sum(w_i1.T*X, axis=1)]).T
  mu_2 = np.array([(1/(np.sum(w_i2)))*np.sum(w_i2.T*X, axis=1)]).T
  for i in range(2000):
    Sigma_1[i:i+1, :, :] = w_i1[i:i+1,:]*np.dot((X[:,i:i+1]-mu_1), (X[:,i:i+1]-mu_1).T)
    Sigma_2[i:i+1, :, :] = w_i2[i:i+1,:]*np.dot((X[:,i:i+1]-mu_2), (X[:,i:i+1]-mu_2).T)
    sigma_1 = (1/(np.sum(w_i1)))*np.sum(Sigma_1, axis=0)
    sigma_2 = (1/(np.sum(w_i2)))*np.sum(Sigma_2, axis=0)

Would really appreciate if someone could point out the mistake in my code or in my misunderstanding of the algorithm.

La solution

One reason why you aren't getting fitted values close to the true values could be the initial values of the parameters used.

It's likely what you have found is a local maxima. You have to try a number of initial starts and then pick the one with that gives the highest likelihood.

Licencié sous: CC-BY-SA avec attribution

Non affilié à datascience.stackexchange