Time delay of sound files using cross correlation

https://stackoverflow.com/questions/23610415

20-07-2023
|

Question

I'm trying to speed up my algorithm of time delay estimation between two sound files in Java. My idea was to use cross correlation and search for the highest value which gives me the amount of samples that are delayed.

I've done this and its working just fine. I created some sample files and then calculated the time delay. The result is pretty good. The problem is, that the algorithm takes a lot of time because of the large amount of operations.

Is there any way to speed this up?

/**
 * Input vector for signal x1 (reference).
 */
private double[] x1;

/**
 * Input vector for signal x2 (test).
 */
private double[] x2;


/**
 * Execute the cross correlation between signal x1 and x2 and calculate the time delay.
 */
public void execCorrelation()
{
    // define the size of the resulting correlation field
    int corrSize = 2*x1.length;
    // create correlation vector
    out = new double[corrSize];
    // shift variable
    int shift = x1.length;
    double val;
    int maxIndex = 0;
    double maxVal = 0;

    // we have push the signal from the left to the right
    for(int i=0;i<corrSize;i++)
    {
        val = 0;
        // multiply sample by sample and sum up
        for(int k=0;k<x1.length;k++)
        {
            // x2 has reached his end - abort
            if((k+shift) > (x2.length -1))
            {
                break;
            }

            // x2 has not started yet - continue
            if((k+shift) < 0)
            {
                continue;
            }

            // multiply sample with sample and sum up
            val += x1[k] * x2[k+shift];
            //System.out.print("x1["+k+"] * x2["+(k+tmp_tau)+"] + ");
        }
        //System.out.println();
        // save the sample
        out[i] = val;
        shift--;
        // save highest correlation index
        if(out[i] > maxVal)
        {
            maxVal = out[i];
            maxIndex = i;
        }
    }

    // set the delay
    this.delay = maxIndex - x1.length;
}

La solution

If I remember correctly, a cross-correlation is the same as convolution with one of the signals time-reversed. A convolution in turn is efficiently calculated by multiplying the spectra of the two signals; i.e., take the FFT of each signal padded at least to the sum of the size of both signals, multiply the FFT transformed spectra, do an inverse IFFT, and search for your peak.

For Java, you can use JTransforms to do the FFT/IFFT.

If you want to play with this approach before actually implementing it, you can try my application FScape; it has a convolution module that takes two sound files (you tagged the question "audio-processing", so I assume you can generate sound files).

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow