Pregunta

I'm training a 2-state HMM on a large English text (first 50,000 characters of Brown Corpus, including only letters and spaces), my algorithm follows from Mark Stamp's tutorial (www.cs.sjsu.edu/~stamp/RUA/HMM.pdf).

Since the observations include only the 26 letters and space, initially I gave each observation (within a state) a probability of 1/27, then modifying each by 0.0001 while keeping the row stochastic.

Running the trainer for 50 iterations gives me very minute incremental improvement in log[P(O|lambda)], where lambda is the updated model. Further, in the observation matrix of the final model, the probability for each observation is almost identical across the two states (see http://pastebin.com/xVVYNhGs).

I figured I'm stuck on a local maximum, so I altered the initial guess for observation matrix to match Stamp's, and it actually gave me a updated observation matrix differing by states* within the same number of iterations. (50 iterations: http://pastebin.com/U0AgrJ2N; 100 iterations: http://pastebin.com/yAkruNjs)

My question is, my altered initial observation matrix (Emission prob) clearly broke me out of the sad local maximum; but how would I go about finding/optimizing this initial guess?

¿Fue útil?

Solución

Answer to this is given in the HMM tutorial paper by Rabiner, Section V-C, Pg 273:

Basically, there is no simple or straightforward answer to the above question. Instead, experience has shown that either random (subject to the stochastic and the nonzero value constraints) or uniform initial estimates of the prior probabilities and the transition matrix is adequate for giving useful reestimates of these parameters in almost all cases.

However, for the emission matrix, experience has shown that good initial estimates are helpful in the discrete symbol case, and are essential (when dealing with multiple mixtures) in the continuous distribution case.**

Such initial estimates can be obtained in a number of ways, including:

1) manual segmentation of the observation sequences into states with averaging observations within states,

2) maximum likelihood segmentation of observations with averaging,

3) segmental k-means segmentation with clustering,

etc.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top