Fast Audio Input/Output

https://stackoverflow.com/questions/846616

21-08-2019
|

Question

Here's what I want to do:

I want to allow the user to give my program some sound data (through a mic input), then hold it for 250ms, then output it back out through the speakers.

I have done this already using Java Sound API. The problem is that it's sorta slow. It takes a minimum of about 1-2 seconds from the time the sound is made to the time the sound is heard again from the speakers, and I haven't even tried to implement delay logic yet. Theoretically there should be no delay, but there is. I understand that you have to wait for the sound card to fill up its buffer or whatever, and the sample size and sampling rate have something to do with this.

My question is this: Should I continue down the Java path trying to do this? I want to get the delay down to like 100ms if possible. Does anyone have experience using the ASIO driver with Java? Supposedly it's faster..

Also, I'm a .NET guy. Does this make sense to do with .NET instead? What about C++? I'm looking for the right technology to use here, and maybe a good example of how to read/write to audio input/output streams using your suggested technology platform. Thanks for your help!

Solution

I've used JavaSound in the past and found it wonderfully flaky (and it keeps changing between VM releases). If you like C#, use it, just use the DirectX APIs. Here's an example of doing kind of what you want to do using DirectSound and C#. You could use the Effects plugins to perform your 250 ms echo.

http://blogs.microsoft.co.il/blogs/tamir/archive/2008/12/25/capturing-and-streaming-sound-by-using-directsound-with-c.aspx

OTHER TIPS

You may want to look into JACK, an audio API designed for low-latency sound processing. Additionally, Google turns up this nifty presentation [PDF] about using JACK with Java.

Theoretically there should be no delay, but there is.

Well, it's impossible to have zero delay. The best you can hope for is an unnoticeable delay (in terms of human perception). It might help if you describe your basic algorithm for reading & writing the sound data, so people can identify possible problems.

A potential issue with using a garbage-collected language like Java is that the GC will periodically run, interrupting your processing for some arbitrary amount of time. However, I'd be surprised if it's >100ms in normal usage. If GC is a problem, most JVMs provide alternate collection algorithms you can try.

If you choose to go down the C/C++ path, I highly recommend using PortAudio ( http://portaudio.com/ ). It works with almost everything on multiple platforms and it gives you low-level control of the sound drivers without actually having to deal with the various sound driver technology that is around.

I've used PortAudio on multiple projects, and it is a real joy to use. And the license is permissive.

If low latency is your goal, you can't beat C.

libsoundio is a low-level C library for real-time audio input and output. It even comes with an example program that does exactly what you want - piping the microphone input to the speakers output.

It's certainly achievable with JavaSound to get end to end latency in the ballpark of 100-150ms.

The primary cause of latency is the buffer sizes of the capture and playback lines. The size is set when opening the lines:
- capture: TargetDataLine#open(AudioFormat format, int bufferSize)
- playback: SourceDataLine#open(AudioFormat format, int bufferSize)

If the buffer is too big it will cause excess latency, but if it's too small it will cause stuttery playback. So you need to find a balance for your applications needs and your computing power.

The default buffer size can be checked with DataLine#getBufferSize when calling #open(AudioFormat format). The default size will vary based on the AudioFormat and seems to be geared for high latency, stutter free playback applications (e.g. internet streaming). If you're developing a low latency application, the default buffer size is much too large and should be changed.

In my testing with a 16-bit PCM AudioFormat, a buffer size of 1024 bytes has been pretty close to ideal for low latency.

The second and often overlooked cause of audio latency is any other activity being done in the capture or playback threads. For example, logging messages to console can introduce 10's of ms of latency. Turn it off.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow