Building MFCC filter banks in the same way as Intel's performance primitives

https://stackoverflow.com/questions/18903545

29-06-2022
|

Pergunta

I'm trying to build the triangular filters for generating MFCCs. I have existing code based on IPP 6 but as IPP 8 is on its way now I'd really like to get an implementation that works and isn't reliant on an old, now unsupported, library.

I've generated the relevant mel scaled center frequencies (plus the 2 on either end).

I am then trying to build the filters as follows:

std::vector< std::vector< float > > ret;
int numFilters  = freqPositions.size() - 2;

for( int f = 1; f < numFilters + 1; f++ )
{
    float freqLow   = freqPositions[f - 1];
    float freqMid   = freqPositions[f];
    float freqHigh  = freqPositions[f + 1];

    float binLow    = (freqLow  / (sampleRate / 2)) * (numSamples + 1);
    float binMid    = (freqMid  / (sampleRate / 2)) * (numSamples + 1);
    float binHigh   = (freqHigh / (sampleRate / 2)) * (numSamples + 1);

    std::vector< float > fbank;
    for( int s = 0; s < (numSamples + 1); s++ )
    {
        if      ( s >= binLow && s < binMid )
        {
            const float fAmpl   = (s - binLow) / (float)(binMid - binLow);
            fbank.push_back( fAmpl );
        }
        else if ( s >= binMid && s <= binHigh )
        {
            const float fAmpl   = 1.0f - ((s - binMid) / (float)(binHigh - binMid));
            fbank.push_back( fAmpl );
        }
        else
        {
            fbank.push_back( 0.0f );
        }

    }

    ret.push_back( fbank );
}

I then piece wise multiply the above vectors with the FFT results (where bin 0 is the 0Hz or DC Offset bin) and add them up (essentially a dot product).

This seems to work reasonably well but the result I get compared to IPP are significantly different enough to leave me slightly concerned.

Is there something I'm doing wrong?

The whole process consists of taking an FFT, calculating the magnitudes of the returned complex vector (std::abs) and then applying the filter banks that are calculated as above. The code is as follows:

std::vector< float > ApplyFilterBanks( std::vector< std::vector< float > >& filterBanks, std::vector< float >& fftMags )
{
    std::vector< float > ret;
    for( int fb = 0; fb < (int)filterBanks.size(); fb++ )
    {
        float res = 0.0f;
        Vec::Dot( res, &filterBanks[fb].front(), &fftMags.front(), filterBanks[fb].size() );
        ret.push_back( res );
    }
    return ret;
}

{
    const int kFFTSize      = 1 << mFFT.GetFFTOrder();
    const int kFFTSizeDiv2  = kFFTSize >> 1;
    std::vector< float > audioToFFT;
    audioToFFT.reserve( kFFTSize );
    std::copy( pAudio, pAudio + numSamples, std::back_inserter( audioToFFT ) );
    audioToFFT.resize( kFFTSize );

    std::vector< float > hammingWindow( numSamples );
    Vec::BuildHammingWindow( hammingWindow );
    Vec::Multiply( &audioToFFT.front(), &audioToFFT.front(), &hammingWindow.front(), numSamples );

    std::vector< std::complex< float > > fftResult( kFFTSize + 1 );

    // FFT the incoming audio.
    mFFT.ForwardFFT( &fftResult.front(), &audioToFFT.front(), kFFTSize );

    // Calculate the magnitudes of the resulting FFT.
    Vec::Magnitude( &audioToFFT.front(), &fftResult.front(), kFFTSizeDiv2 + 1 );
    //Vec::Multiply( &audioToFFT.front(), &audioToFFT.front(), &audioToFFT.front(), kFFTSizeDiv2 + 1 );

    // Apply the MFCC filter banks.
    std::vector< float > filtered   = ApplyFilterBanks( mFilterBanks, audioToFFT );
}

Here is a plot where Series 1 is my MFCCs and Series 2 is IPP's:

My MFCCs vs IPP's

After the log and liftering stages (which I have confirmed to work the same way as IPP's) the results are even more wrong.

Any ideas and pointers would be massively appreciated!

Edit: I should point out that there is some documentation on the IPP functions here:

http://software.intel.com/sites/products/documentation/hpc/ipp/ipps/ipps_ch8/functn_MelFBankInitAlloc.html

This appears to show the maths. I'm not sure, however, what exactly yk and ck are ...

Solução

Ok I've done a lot better on the problem now.

I found 2 problems, firstly:

float binLow    = (freqLow  / (sampleRate / 2)) * (numSamples + 1);
float binMid    = (freqMid  / (sampleRate / 2)) * (numSamples + 1);
float binHigh   = (freqHigh / (sampleRate / 2)) * (numSamples + 1);

should be:

float binLow    = (freqLow  / (sampleRate / 2)) * (numSamples);
float binMid    = (freqMid  / (sampleRate / 2)) * (numSamples);
float binHigh   = (freqHigh / (sampleRate / 2)) * (numSamples);

and secondly I was calculating my steps through mel space incorrectly. I was doing the following:

const float melStep     = melDiff / (numFilterBanks + 2);

when I should have been doing:

const float melStep     = melDiff / (numFilterBanks + 1);

Now my results, while not identical, now show a MUCH better correspondence:

Pre-log and liftered MFCCs

And the final MFCCs:

Final MFCCs

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow