Parameters to improve a music frequency analyzer

https://stackoverflow.com/questions/7829713

27-10-2019
|

Question

I'm using a FFT on audio data to output an analyzer, like you'd see in Winamp or Windows Media Player. However the output doesn't look that great. I'm plotting using a logarithmic scale and I average the linear results from the FFT into the corresponding logarithmic bins. As an example, I'm using bins like:

16k,8k,4k,2k,1k,500,250,125,62,31,15 [hz]

Then I plot the magnitude (dB) against frequency [hz]. The graph definitely 'reacts' to the music, and I can see the response of a drum sample or a high pitched voice. But the graph is very 'saturated' close to the lower frequencies, and overall doesn't look much like what you see in applications, which tend to be more evenly distributed. I feel that apps that display visual output tend to do different things to the data to make it look better.

What things could I do to the data to make it look more like the typical music player app?

Some useful information: I downsample to single channel, 32kHz, and specify a time window of 35ms. That means the FFT gets ~1100 points. I vary these values to experiment (ie tried 16kHz, and increasing/decreasing interval length) but I get similar results.

Solution

With an FFT of 1100 points, you probably aren't able to capture the low frequencies with a lot of frequency resolution.

Think about it, 30 Hz corresponds to a period of 33ms, which at 32kHz is roughly 1000 samples. So you'll only be able to capture about 1 period in this time.

Thus, you'll need a longer FFT window to capture those low frequencies with sharp frequency resolution.

You'll likely need a time window of 4000 samples or more to start getting noticeably more frequency resolution at the low frequencies. This will be fine too, since you'll still get about 8-10 spectrum updates per second.

One option too, if you want very fast updates for the high frequency bins but good frequency resolution at the low frequencies, is to update the high frequency bins more quickly (such as with the windows you're currently using) but compute the low frequency bins less often (and with larger windows necessary for the good freq. resolution.)

OTHER TIPS

I think a lot of these applications have variable FFT bins.

What you could do is start with very wide evenly spaced FFT bins like you have and then keep track of the number of elements that are placed in each FFT bin. If some of the bins are not used significantly at all (usually the higher frequencies) then widen those bins so that they are larger (and thus have more frequency entries) and shring the low frequency bins.

I have worked on projects were we just spend a lot of time tuning bins for specific input sources but it is much nicer to have the software adjust in real time.

A typical visualizer would use constant-Q bandpass filters, not a single FFT.

You could emulate a set of constant-Q bandpass filters by multiplying the FFT results by a set of constant-Q filter responses in the frequency domain, then sum. For low frequencies, you should use an FFT longer than the significant impulse response of the lowest frequency filter. For high frequencies, you can use shorter FFTs for better responsiveness. You can slide any length FFTs along at any desired update rate by overlapping (re-using) data, or you might consider interpolation. You might also want to pre-window each FFT to reduce "spectral leakage" between frequency bands.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow