Remove known audio output from microphone input

Question 1

Yes, this is possible. Two methods:

Time Domain

If you can guarantee that the mixed audio is sample-accurate to the timing of the original stream1, then you can simply negate the original stream1 and add it to the mix. Now, you might have to scale that waveform a bit, since usually when audio is mixed, their level is reduced.

If there are other things done to the audio (such as level compression), then this affects your ability to do this sort of subtraction of sound cleanly.

Frequency Domain

While normal PCM-encoded audio is just a sampling of pressure many times per second, this is not how sound is fully perceived. We hear different frequencies. If you use a Fourier transform (normally done with an FFT algorithm), you convert audio samples from a time domain to the frequency domain, giving you the level of sound in various frequency buckets along the way.

If you convert both stream1 and the mix to the frequency domain, subtract stream1 from the mix, and then convert back to the time domain for output, you can effectively remove much of stream1 from the mix. The more frequency buckets you use, the more CPU needed, but the more accurate this removal will be. Note that while this means you don't have to quite be sample-accurate, it does typically hurt the quality of the sound from the mix.

Many audio editing programs use this method to remove background noise.

Question 2

Sound is simply a curve - typically it fluctuates above and below zero over time (16 bit audio has 2^16 possible integers available so raw PCM audio is just a stream of integers in the range of +- 32768) - once in this format - just toggle the sign (+-) of the stream1 integer then add it to the corresponding mix integer as your walk through the data of both stream1 and mix an integer at a time and then renormalize back to +- 32768 to regain your volume - this effectively erases stream1 from your mix - the audio tool Audacity gives you this option