Question

I'm trying to detect echoes of my chirp in my sound recording on Android and it seems cross correlation is the most appropriate way of finding where the FFTs of the two signals are similar and from there I can identify peaks in the cross correlated array which will correspond to distances.

From my understanding, I have come up with the following cross correlation function. Is this correct? I wasn't sure whether to add zeros to the beginning as and start a few elements back?

public double[] xcorr1(double[] recording, double[] chirp) {        
    double[] recordingZeroPadded = new double[recording.length + chirp.length];

    for (int i = recording.length; i < recording.length + chirp.length; ++i)
            recordingZeroPadded[i] = 0;

    for (int i = 0; i < recording.length; ++i)
            recordingZeroPadded[i] = recording[i];

    double[] result = new double[recording.length + chirp.length - 1];

    for (int offset = 0; offset < recordingZeroPadded.length - chirp.length; ++offset)
        for (int i = 0; i < chirp.length; ++i)
            result[offset] += chirp[i] * recordingZeroPadded[offset + i];
    return result;
}

Secondary question:

According to this answer, it can also be calculated like

corr(a, b) = ifft(fft(a_and_zeros) * fft(b_and_zeros[reversed]))

which I don't understand at all but seems easy enough to implement. That said I have failed (assuming my xcorr1 is correct). I feel like I've completely misunderstood this?

public double[] xcorr2(double[] recording, double[] chirp) {
    // assume same length arguments for now
    DoubleFFT_1D fft = new DoubleFFT_1D(recording.length);
    fft.realForward(recording);
    reverse(chirp);
    fft.realForward(chirp);
    double[] result = new double[recording.length];

    for (int i = 0; i < result.length; ++i)
        result [i] = recording[i] * chirp[i];

    fft.realInverse(result, true);
    return result;
}

Assuming I got both working, which function would be most appropriate given that the arrays will contain a few thousand elements?

EDIT: Btw, I have tried adding zeros to both ends of both arrays for the FFT version.


EDIT after SleuthEye's response:

Can you just verify that, because I'm dealing with 'actual' data, I need only do half the computations (the real parts) by doing a real transform?

From your code, it looks as though the odd numbered elements in the array returned by the REAL transform are imaginary. What's going on here?

How am I going from an array of real numbers to complex? Or is this the purpose of a transform; to move real numbers into the complex domain? (but the real numbers are just a subset of the complex numbers and so wouldn't they already be in this domain?)

If realForward is in fact returning imaginary/complex numbers, how does it differ to complexForward? And how do I interpret the results? The magnitude of the complex number?

I apologise for my lack of understanding with regard to transforms, I have only so far studied fourier series.

Thanks for the code. Here is 'my' working implementation:

public double[] xcorr2(double[] recording, double[] chirp) {
    // pad to power of 2 for optimisation
    int y = 1;
    while (Math.pow(2,y) < recording.length + chirp.length)
        ++y;
    int paddedLength = (int)Math.pow(2,y);

    double[] paddedRecording = new double[paddedLength];
    double[] paddedChirp = new double[paddedLength];

    for (int i = 0; i < recording.length; ++i)
            paddedRecording[i] = recording[i];

    for (int i = recording.length; i < paddedLength; ++i)
            paddedRecording[i] = 0;

    for (int i = 0; i < chirp.length; ++i)
            paddedChirp[i] = chirp[i];

    for (int i = chirp.length; i < paddedLength; ++i)
            paddedChirp[i] = 0;

    reverse(chirp);
    DoubleFFT_1D fft = new DoubleFFT_1D(paddedLength);
    fft.realForward(paddedRecording);
    fft.realForward(paddedChirp);
    double[] result = new double[paddedLength];

    result[0] = paddedRecording[0] * paddedChirp[0]; // value at f=0Hz is real-valued
    result[1] = paddedRecording[1] * paddedChirp[1]; // value at f=fs/2 is real-valued and packed at index 1
    for (int i = 1; i < result.length / 2; ++i) {
        double a = paddedRecording[2*i];
        double b = paddedRecording[2*i + 1];
        double c = paddedChirp[2*i];
        double d = paddedChirp[2*i + 1];

        // (a+b*j)*(c-d*j) = (a*c+b*d) + (b*c-a*d)*j
        result[2*i]     = a*c + b*d;
        result[2*i + 1] = b*c - a*d;
    }

    fft.realInverse(result, true);

    // discard trailing zeros
    double[] result2 = new double[recording.length + chirp.length - 1];
    for (int i = 0; i < result2.length; ++i)
        result2[i] = result[i];

    return result2;
}

However, until about 5000 elements each, xcorr1 seems to be quicker. Am I doing anything particularly slow (perhaps the constant 'new'ing of memory -- maybe I should cast to an ArrayList)? Or the arbitrary way in which I generated the arrays to test them? Or should I do the conjugates instead of reversing it? That said, performance isn't really an issue so unless there's something obvious you needn't bother pointing out optimisations.

Was it helpful?

Solution

Your implementation of xcorr1 does correspond to the standard signal-processing definition of cross-correlation.

Relative to your interrogation with respect to adding zeros at the beginning: adding chirp.length-1 zeros would make index 0 of the result correspond to the start of transmission. Note however that the peak of the correlation output occurs chirp.length-1 samples after the start of echoes (the chirp has to be aligned with the full received echo). Using the peak index to obtain echo delays, you would then have to adjust for that correlator delay either by subtracting the delay or by discarding the first chirp.length-1 output results. Noting that the additional zeros correspond to that many extra outputs at the beginning, you'd probably be better off not adding those zeros at the beginning in the first place.

For xcorr2 however, a few things need to be addressed. First, if the recording and chirp inputs are not already zero-padded to at least chirp+recording data length you would need to do so (preferably to a power of 2 length for performance reasons). As you are aware, they would both need to be padded to the same length.

Second, you didn't take into account that the multiplication indicated in the posted reference answer, correspond in fact to complex multiplications (whereas DoubleFFT_1D.realForward API uses doubles). Now if you are going to implement something such as a complex multiplication with the chirp's FFT, you might as well actually implement the multiplication with the complex conjugate of the chirp's FFT (the alternate implementation indicated in the reference answer), removing the need to reverse the time-domain values.

Also accounting for DoubleFFT_1D.realForward packing order for even length transforms, you would get:

// [...]
fft.realForward(paddedRecording);
fft.realForward(paddedChirp);

result[0] = paddedRecording[0]*paddedChirp[0]; // value at f=0Hz is real-valued
result[1] = paddedRecording[1]*paddedChirp[1]; // value at f=fs/2 is real-valued and packed at index 1
for (int i = 1; i < result.length/2; ++i) {
    double a = paddedRecording[2*i];
    double b = paddedRecording[2*i+1];
    double c = paddedChirp[2*i];
    double d = paddedChirp[2*i+1];

    // (a+b*j)*(c-d*j) = (a*c+b*d) + (b*c-a*d)*j
    result[2*i]   = a*c + b*d;
    result[2*i+1] = b*c - a*d;
}
fft.realInverse(result, true);
// [...]

Note that the result array would be of the same size as paddedRecording and paddedChirp, but only the first recording.length+chirp.length-1 should be kept.

Finally, relative to which function is the most appropriate for arrays of a few thousand elements, the FFT version xcorr2 is likely going to be much faster (provided you restrict array lengths to powers of 2).

OTHER TIPS

The direct version doesn't require zero-padding first. You just take recording of length M and chirp of length N and calculate result of length N+M-1. Work through a tiny example by hand to grok the steps:

recording = [1, 2, 3]
chirp = [4, 5]

  1 2 3
4 5

  1 2 3
  4 5

  1 2 3
    4 5

  1 2 3
      4 5


result = [1*5, 1*4 + 2*5, 2*4 + 3*5, 3*4] = [5, 14, 23, 4]

The FFT method is much faster if you have long arrays. In this case you have to zero-pad each input to size M+N-1 so that both input arrays are the same size before taking the FFT.

Also, the FFT output is complex numbers, so you need to use complex multiplication. (1+2j)*(3+4j) is -5+10j, not 3+8j. I don't know how your complex numbers are arranged or handled, but make sure this is right.

Or is this the purpose of a transform; to move real numbers into the complex domain?

No, the Fourier transform transforms from the time domain to the frequency domain. The time domain data can be either real or complex, and the frequency domain data can be either real or complex. In most cases you have real data with a complex spectrum. You need to read up on the Fourier transform.

If realForward is in fact returning imaginary/complex numbers, how does it differ to complexForward?

The real FFT takes a real input, while the complex FFT takes a complex input. Both transforms produce complex numbers as their output. That's what the DFT does. The only time a DFT produces real output is if the input data is symmetrical (in which case you can use the DCT to save even more time).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top