Question

I'm trying to learn how to work with audio in as many different ways as possible.

Given a known audio stream (lets call it stream1) and an unknown audio stream (stream2) which are mixed into one single stream (mix1).

Now assuming that we know stream1 in advance but not stream2 would it be possible to use stream1 to cancel itself out of mix1 and therefore give us stream2 with a minimum of noise/interference?

To give it a real world context imagine a situation would be where your computer has a microphone and speakers (not headphones) and because the computer knows in advance (ok, only milliseconds, but still) the output to the speakers would it be possible to cancel that sound from the mix coming in on the microphone. In this real world situation the known stream is not perfectly known as there is likely to be some distortion between transmission and reception.

Assuming this is possible can someone suggest some reading about the algorithms involved?

Was it helpful?

Solution

Yes, this is possible. Two methods:

Time Domain

If you can guarantee that the mixed audio is sample-accurate to the timing of the original stream1, then you can simply negate the original stream1 and add it to the mix. Now, you might have to scale that waveform a bit, since usually when audio is mixed, their level is reduced.

If there are other things done to the audio (such as level compression), then this affects your ability to do this sort of subtraction of sound cleanly.

Frequency Domain

While normal PCM-encoded audio is just a sampling of pressure many times per second, this is not how sound is fully perceived. We hear different frequencies. If you use a Fourier transform (normally done with an FFT algorithm), you convert audio samples from a time domain to the frequency domain, giving you the level of sound in various frequency buckets along the way.

If you convert both stream1 and the mix to the frequency domain, subtract stream1 from the mix, and then convert back to the time domain for output, you can effectively remove much of stream1 from the mix. The more frequency buckets you use, the more CPU needed, but the more accurate this removal will be. Note that while this means you don't have to quite be sample-accurate, it does typically hurt the quality of the sound from the mix.

Many audio editing programs use this method to remove background noise.

OTHER TIPS

Sound is simply a curve - typically it fluctuates above and below zero over time (16 bit audio has 2^16 possible integers available so raw PCM audio is just a stream of integers in the range of +- 32768) - once in this format - just toggle the sign (+-) of the stream1 integer then add it to the corresponding mix integer as your walk through the data of both stream1 and mix an integer at a time and then renormalize back to +- 32768 to regain your volume - this effectively erases stream1 from your mix - the audio tool Audacity gives you this option

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top