Receiving audio data from multiple sources and merging them into a wave file

https://stackoverflow.com/questions/10272258

02-06-2021
|

Question

I'm using C# and DirectSound to record audio and send it across a local network to another computer that receives the data and saves it into a wave file.

I used similar code for recording and similar for saving the wave file (using NAudio).

Saving all received bytes into a wave file works. But, I would like to receive audio from multiple computers and merge it into a single wave file.

I tried to record two separate wave files, one for each client, then merging them together. However, my naive approach does not provide any means of synchronization. The resulting wave files differ in length by at least 5 seconds and cannot be appropriately merged.

So, here are my questions:

1) How can I take audio bytes received from a network and save them into a wave file in such way that the resulting file plays back audio at the correct time?

For instance, I record 100 seconds of audio and send it across the network. But, only 95 seconds of audio bytes are actually recorded. The missing 5 seconds results from the accumulation of the small delays that occur while I send packets.

So, how can I synchronize those 95 seconds of audio to playback at the right time, during 100 seconds?

2) How can I mix the audio bytes I receive from multiple clients to obtain a single wave file that plays the audio of all clients in sync?

Please let me know if I need to clarify my question. I appreciate any assistance!

Solution

You need high-precision time synchronization on the participating computers (plus timestamps on the sent packets). To minimize clock drift, you can re-sync at pre-determined intervals (less preferred) or install higher quality hardware clocks in the machines (preferred). If these machines are on a local network, setting up one as an NTP server and the others as clients will yield sufficiently sync'd clocks (i.e., within .000001 seconds of one another or better).

OTHER TIPS

If you want to synchronize audio from multiple sources then you need to buffer the data at the receiving end. Don't write or play any audio data until you have at least BUFFERSIZE samples from every participating source. You will have to adjust BUFFERSIZE based on your network latency.

You can do this using two threads, one to write or play the audio data and the other to buffer the incoming streams. When BUFFERSIZE samples have been received from each audio source, the receiving thread passes all the data to the other thread to write or play. It will usually always be the case that most streams have more than BUFFERSIZE samples. You would have to keep these leftovers around for the next fill-up.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow