Question

I've implemented a simple autocorrelation routine against some audio samples at a rate of 44100.0 with a block size of 2048.

The general formula I am following looks like this:

r[k] = a[k] * b[k] = ∑ a[n] • b[n + k]

and I've implemented it in a brute-force nested loop as follows:

for k = 0 to N-1 do 
    for n = 0 to N-1 do
        if (n+k) < N 
            then r[k] := r[k] + a(n)a(n+k)
    else
        break;
    end for n; 
end for k;

I look for the max magnitude in r and determine how many samples away it is and calculate the frequency.

To help temper the tuner's results, I am using a circular buffer and returning the median each time.

The brute force calculations are a bit slow - is there a well-known, faster way to do them?

Sometimes, the tuner just isn't quite as accurate as is needed. What type of heuristics can I apply here to help refine the results?

Sometimes the OCTAVE is incorrect - is there a way to hone in on the correct octave a bit more accurately?

Was it helpful?

Solution

One simple way to improve this "brute force" autocorrelation method is to limit the range of k and only search for lags (or pitch periods) near the previous average period, say within +-0.5 semitones at first. If you don't find a correlation, then search a slightly wider range, say, a within a major third, then search a wider range but within the expected frequency range of the instrument being tuned.

You can get higher frequency resolution by using a higher sample rate (say, upsampling the data before the autocorrelation if necessary, and with proper filtering).

You will get autocorrelation peaks for the pitch lag (period) and for multiples of that lag. You will have to eliminate those subharmonics somehow (maybe as impossible for the instrument, or perhaps as an unlikely pitch jump from the previous frequency estimations.)

OTHER TIPS

The efficient way to do autocorrelation is with an FFT:

  • FFT the time domain signal
  • convert complex FFT output to magnitude and zero phase (i.e. power spectrum)
  • take inverse FFT

This works because autocorrelation in the time domain is equivalent to power spectrum in the frequency domain.

Having said that, bare bones autocorrelation is not a great way to implement (accurate) pitch detection in general, so you might want to have a rethink about your whole approach.

I don't fully understand the question, but I can point out one trick that you might be able to use. You say you look for the sample that is the max magnitude. If it is useful in the rest of your calculations, you can calculate that sample number to sub-sample precision.

Let's say you have a peak of 0.9 at sample 5 and neighboring samples of 0.1 and 0.8. The actual peak is probably somewhere between sample 5 and sample 6.

(0.1 * 4 + 0.9 * 5 + 0.8 * 6) / (0.1 + 0.9 + 0.8) = 5.39
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top