Question

I am currently working on my fourth year project (computer science) which involves the automatic transcription of music -> sheet music. I am doing it in Matlab at the moment but will have to be converted to java at some stage.

My problem: I have my program returning the correct notes for pure sine tones, I have now encountered a problem when it comes to the retrieval of the fundamental frequency from a note played by a natural instrument. With certain notes, the peak representing the fundamental of the note seems to be missing entirely. For example when I play a G3 note from garageband, it is shown as a G4, as only the 1st, 3rd, 5th and 7th harmonics are appearing in my plot. I tried to add the image but as this is my first post it wouldn't allow me. Any pointers in the right direction would be greatly appreciated.

Was it helpful?

Solution

This is not unusual. It's very common for the fundamental to be missing, or nearly so, for male voices, big string instruments, and many other pitched sound sources.

That makes using an FFT peak result alone extremely poor at determining musical notes from actual musical instruments, as opposed to sinewave function generators. That's because pitch is different from peak spectral frequency. Pitch is a psycho-acoustic perceptual phenomena. So that's what you need to read up on. There are tons of research papers on the subject.

So you need to look at a completely different set of algorithms. Try cepstrums (cepstral analysis), harmonic product spectrums, autocorrelation and similar (AMDF, ASDF, etc. lags), RAPT (Robust Algorithm for Pitch Tracking), YAAPT, etc.

ADDED: I wrote up a more detailed explanation of pitched sounds with missing fundamentals in a blog post.

OTHER TIPS

It isn't unusual for the fundamental frequency of a musical instrument note to be attenuated relative to the harmonics (also known as overtones), and in some cases the fundamental frequency magnitude may be well below the magnitude of the overtones.

Take a look at this frequency/magnitude plot of a real bassoon (not a synthesized bassoon) playing a G3 note. Observe the attenuated fundamental (196.39 Hz) relative to the first harmonic. But also observe that all the integer-multiple harmonics are visible upto the 10th harmonic. Actually, many more harmonics are present, but they aren't visible on this linear magnitude plot.

BassoonG3frequencyMagnitude

In your case, the additional fact that your G3 musical note's spectrum is showing only the 1st, 3rd, 5th and 7th harmonics suggests that something is wrong. Your test sound appears to be synthesized, so the problem could be with the way the sound was synthesized.

The spectra of real musical instruments typically show the fundamental frequency and many integer-multiple harmonics such as 1, 2, 3 and so on, as seen above. And the harmonics typically extend well above 6KHz for most notes played on most instruments.

Take a look at this frequency/decibel_magnitude plot of a real bassoon (not a synthesized bassoon) playing a G3 note. Observe that a total of 37 integer-multiple harmonics are present, until they dissappear at the noise floor near -104 dB.

BassoonG3frequencyDecibelMagnitude

You can listen to this bassoon sample and see its spectrum here: Bassoon musical instrument spectrum

Also read this detailed post on analytical approaches to autonomous musical transcription

Have you tried running it through a spectrogram (function spectrogram in MATLAB) to identify what is happening?

I don't know what algorithms you use, without that information, we can't say what is going wrong. What alarms me is that your second third harmonic (second peak in the plot) is much larger than your second harmonic (first peak in the plot).

Are you sure you have all the sampling right: i.e. your DFT only has frequencies up to half the sampling frequency (both positive and negative frequency range)? Also: how do you suppress any transient part of your signal?

The fact that you see peaks at 2f, 4f, 6f and 8f strongly implies that either your input data is actually an octave above what you think it is, or that you're misinterpreting the frequency scale of your results. If you were just missing the fundamental frequency, you'd see 3f, 5f and 7f as well.

Suggestions:

  • Plot your input data before you FT it. You should be able to eyeball the frequency of the dominant term.
  • Listen to the note produced by garageband. Is it above or below middle C?
  • Check that you understand where the values on the frequency scale on your plot came from.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top