Downsampling and applying a lowpass filter to digital audio

https://stackoverflow.com/questions/238382

04-07-2019
|

Question

I've got a 44Khz audio stream from a CD, represented as an array of 16 bit PCM samples. I'd like to cut it down to an 11KHz stream. How do I do that? From my days of engineering class many years ago, I know that the stream won't be able to describe anything over 5500Hz accurately anymore, so I assume I want to cut everything above that out too. Any ideas? Thanks.

Update: There is some code on this page that converts from 48KHz to 8KHz using a simple algorithm and a coefficient array that looks like { 1, 4, 12, 12, 4, 1 }. I think that is what I need, but I need it for a factor of 4x rather than 6x. Any idea how those constants are calculated? Also, I end up converting the 16 byte samples to floats anyway, so I can do the downsampling with floats rather than shorts, if that helps the quality at all.

Solution

Read on FIR and IIR filters. These are the filters that use a coefficent array.

If you do a google search on "FIR or IIR filter designer" you will find lots of software and online-applets that does the hard job (getting the coefficients) for you.

EDIT:

This page here ( http://www-users.cs.york.ac.uk/~fisher/mkfilter/ ) lets you enter the parameters of your filter and will spit out ready to use C-Code...

OTHER TIPS

You're right in that you need apply lowpass filtering on your signal. Any signal over 5500 Hz will be present in your downsampled signal but 'aliased' as another frequency so you'll have to remove those before downsampling.

It's a good idea to do the filtering with floats. There are fixed point filter algorithms too but those generally have quality tradeoffs to work. If you've got floats then use them!

Using DFT's for filtering is generally overkill and it makes things more complicated because dft's are not a contiuous process but work on buffers.

Digital filters generally come in two tastes. FIR and IIR. The're generally the same idea but IIF filters use feedback loops to achieve a steeper response with far less coefficients. This might be a good idea for downsampling because you need a very steep filter slope there.

Downsampling is sort of a special case. Because you're going to throw away 3 out of 4 samples there's no need to calculate them. There is a special class of filters for this called polyphase filters.

Try googling for polyphase IIR or polyphase FIR for more information.

Notice (in additions to the other comments) that the simple-easy-intuitive approach "downsample by a factor of 4 by replacing each group of 4 consecutive samples by the average value", is not optimal but is nevertheless not wrong, nor practically nor conceptually. Because the averaging amounts precisely to a low pass filter (a rectangular window, which corresponds to a sinc in frequency). What would be conceptually wrong is to just downsample by taking one of each 4 samples: that would definitely introduce aliasing.

By the way: practically any software that does some resampling (audio, image or whatever; example for the audio case: sox) takes this into account, and frequently lets you choose the underlying low-pass filter.

You need to apply a lowpass filter before you downsample the signal to avoid "aliasing". The cutoff frequency of the lowpass filter should be less than the nyquist frequency, which is half the sample frequency.

The process you're after called "Decimation". There are 2 steps:

Applying Low Pass Filter on the data (In your case LPF with Cut Off at Pi / 4).
Downsampling (In you case taking 1 out of 4 samples).

There are many methods to design and apply the Low Pass Filter.

You may start here:

http://en.wikipedia.org/wiki/Filter_design

You could make use of libsamplerate to do the heavy lifting. Libsamplerate is a C API, and takes care of calculating the filter coefficients. You to select from different quality filters so that you can trade off quality for speed.

If you would prefer not to write any code, you could just use Audacity to do the sample rate conversion. It offers a powerful GUI, and makes use of libsamplerate for it's sample rate conversion.

I would try applying DFT, chopping 3/4 of the result and applying inverse DFT. I can't tell if it will sound good without actually trying tough.

The "best" solution possible is indeed a DFT, discarding the top 3/4 of the frequencies, and performing an inverse DFT, with the domain restricted to the bottom 1/4th. Discarding the top 3/4ths is a low-pass filter in this case. Padding to a power of 2 number of samples will probably give you a speed benefit. Be aware of how your FFT package stores samples though. If it's a complex FFT (which is much easier to analyze, and generally has nicer properties), the frequencies will either go from -22 to 22, or 0 to 44. In the first case, you want the middle 1/4th. In the latter, the outermost 1/4th.

You can do an adequate job by averaging sample values together. The naïve way of grabbing samples four by four and doing an equal weighted average works, but isn't too great. Instead you'll want to use a "kernel" function that averages them together in a non-intuitive way.

Mathwise, discarding everything outside the low-frequency band is multiplication by a box function in frequency space. The (inverse) Fourier transform turns pointwise multiplication into a convolution of the (inverse) Fourier transforms of the functions, and vice-versa. So, if we want to work in the time domain, we need to perform a convolution with the (inverse) Fourier transform of box function. This turns out to be proportional to the "sinc" function (sin at)/at, where a is the width of the box in the frequency space. So at every 4th location (since you're downsampling by a factor of 4) you can add up the points near it, multiplied by sin (a dt) / a dt, where dt is the distance in time to that location. How nearby? Well, that depends on how good you want it to sound. It's common to ignore everything outside the first zero, for instance, or just take the number of points to be the ratio by which you're downsampling.

Finally there's the piss-poor (but fast) way of just discarding the majority of the samples, keeping just the zeroth, the fourth, and so on.

Honestly, if it fits in memory, I'd recommend just going the DFT route. If it doesn't use one of the software filter packages that others have recommended to construct the filter for you.

I recently came across BruteFIR which may already do some of what you're interested in?

You have to apply low-pass filter (removing frequencies above 5500 Hz) and then apply decimation (leave every Nth sample, every 4th in your case).

For decimation, FIR, not IIR filters are usually employed, because they don't depend on previous outputs and therefore you don't have to calculate anything for discarded samples. IIRs, generally, depends on both inputs and outputs, so, unless a specific type of IIR is used, you'd have to calculate every output sample before discarding 3/4 of them.

Just googled an intro-level article on the subject: https://www.dspguru.com/dsp/faqs/multirate/decimation

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow