How to use numpy with portaudio to extract bass, mid treble

https://stackoverflow.com/questions/1794010

22-09-2019
|

Question

As in this example How to extract frequency information from an input audio stream (using PortAudio)? I'm curious about portaudio and numpy...

I'm not 100% sure about fft, how can I pass numpy a chunk and get back three values from -1.0 to 1.0 for bass, mid and treble ?

I don't mind if this just for one channel as I can make sense of the audio part of this, it's the maths that swim in front of me when I look at them :)

Solution

Actually, you would not use a Fourier transform to do this.

Splitting any audio signal in bass, mid and treble is usually done using filters. A filter is a signal processing device that attenuates certain frequency ranges. Filters can be build digitally or electrically. For example, they are used in the audio crossover systems in loudspeakers.

To get the low-frequency bass part you would use a low-pass filter. Low-pass filters filter out high frequencies. They are also called 'high-cut' filters.
To get the mid-frequency mid part you would use a band-pass filter. Band-pass filters filter out both low and high frequencies. They are also called 'bell-filters'.
To get the high-frequency treble part you would use a high-pass filter. High-pass filters filter out any low frequencies. They are also called 'low-cut' filters.

Actually, you could also only use the high-pass and low-pass filter. If you subtract both filtered signals from the original signal, the result would be a band-pass filtered signal. This saves you one filter.

Each filter will have a threshold frequency. The threshold frequency is a special frequency, from which the filter should start filtering. Depending on the filter order, the signal will be attenuated by 6 dB/oct (1st order), 12 dB/oct (2nd order), 18 dB/oct (3rd order), etc. For your application, a 2nd order design is probably fine.
Note that filters in general mess with your signal in some ways and the higher the order, the more audible this can get. By the way, this is pure physics and true for all signal processing including Fourier transforms.

Using these three filters is (can be) equivalent to doing a Fourier transform with only three spectral points.

OTHER TIPS

The Fourier Transform, mentioned in the selected answer to the SO question you point to, gives you the "spectrum" -- a large collection of values giving the sound intensity in each of various ranges/slices of frequencies (expressed, for example, in Hertz).

How to translate (say) a thousand intensities (one per each 10-Hertz slice of the spectrum, say) into just three numbers, as you desire, is of course quite a heuristic issue -- for example you could just decide which ranges of frequencies correspond to "bass" and "treble", with everything in-between being "mid", and compute the average intensities in each. For what it's worth, I believe a common convention for "bass" is up to 250Hz, for "treble" 6KHz and above (in-between being the "midrange"), cfr e.g. this page -- but it's rather an arbitrary convention, so, "pick your poison"!-)

Once you have the relative levels you'll want to normalize them with respect to each other and scale them appropriately to lie in your desired range (presumably on a logarithmic scale because that's how human hearing works;-).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow