Pregunta

I'm trying to implement some code from here

And I have trained the HMM with my coefficients but do not understand how the Viterbi Decoder algorithm works, for example:

 viterbi_decode(MFCC, M, model, q);
 where MFCC = coefficents 
 M = size of MFCC
 model = Model of HMM training using the MFCC coefficients 
 q = unknown (believed to be the outputted path).

But here is what I do not understand: I am attempting to compare two speech signals (training, sample) to find out the closest possible match. With the DTW algorithm for example, a single integer is returned where I can then find the closest, however, with this algorithm it returns a int* array and therefore differentiating is difficult.

Here is how the current program works:

vector<DIMENSIONS_2> MFCC = mfcc.transform(rawData, sample_rate);

int N = MFCC.size();
int M = 13;

double** mfcc_setup = setupHMM(MFCC, N, M);

model_t* model = hmm_init(mfcc_setup, N, M, 10);

hmm_train(mfcc_setup, N, model);

int* q = new int[N];

viterbi_decode(mfcc_setup, M, model, q); 

Could anyone please tell me how the Viterbi Decoder works for the problem of identifying which is the best path to take from the training, to the input? I've tried both the Euclidean distance as well as the Hamming Distance on the decode path (q) but had no such luck.

Any help would be greatly appreciated

¿Fue útil?

Solución

In this example it seems to me that (q) is the hidden state sequence, so a list of numbers from 0->9. If you have two audio samples say, test and train, and you generate two sequences q_test and q_train, then thinking about |q_test - q_train|, where the norm is componentwise distance, is not useful because it isn't representing a notion of distance correctly, since hidden state labels in HMM may be arbitrary.

A more natural way to think about distance may be the following, given q_train, you are interested in the probability that your test sample took that same path, which you can compute once you have the transition matrix and emission probabilites.

Please let me know if I am misunderstanding your question.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top