Pergunta

I'm trying to compute a MFCC algorithm based upon this paper I found (http://arxiv.org/pdf/1003.4083.pdf) so what I have done so far is:

step 1) Pre–emphasis

step 2) Framing

step 3) Hamming windowing

step 4) Fast Fourier Transform

step 5) Mel Filter Bank Processing

step 6) : Discrete Cosine Transform

Basically, I took the Mel Bank filters and multiplied them the actual raw signal. I then performed the FFT on these results this looks like this:

FFT on Frame 1:

enter image description here

And then I computed the DCT of the FFT, which results look like this:

DCT on Frame 1:

enter image description here

Does this look correct so far? Is there even a way for me to check this, so that I know that I am going in the right direction?

Also, I need to get 13 Coefficients but I do not know how to determine which of these to get. I get 256 values, so do I take the first 13 of them? Or, do I get the total energy?

I hope someone can help me.

Foi útil?

Solução 3

No, you are wrong.

You need to compute logarithm of the mel filter bank energies after FFT and only then apply DCT. The number of energies of filterbanks should be about 20 or 40, after DCT you should get 20 or 40 numbers and take first 13.

What you did with FFT is all wrong.

You might want to read some MFCC code instead of doing something from scratch, there are many implementations out there, for example in sphinxbase:

http://cmusphinx.sourceforge.net

Outras dicas

After days of search for something similar, I stumbled upon a very usefull tutorial of how to get the MFC Coeficients: Mel Frequency Cepstral Coefficient (MFCC) tutorial

(although the thread is old, I hope the answer might help future readers)

I'm confused to what you just wrote. The only thing I need to know is I have split the signal into frames, n = 100, m = 256 (I believe) which produces around 390 blocks, so, is there 13 coefficients for each of the blocks OR just 13 for the entire sound fle?

the answer is that there are 13 coefficients for each block, not for entire sound file.

and your way to calculate mfcc coefficients are wrong, you should follow the 1-6 steps you mentioned.

step 1) Pre–emphasis for the entire sound file.

step 2) Framing the entire sound file to get many blocks

step 3) Hamming windowing for each block

step 4) Fast Fourier Transform for each block

step 5) Mel Filter Bank Processing for each block

step 6) : Discrete Cosine Transform for each block

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top