Real time audio conversation iOS

https://stackoverflow.com/questions/12297190

30-06-2021
|

Question

I am designing an iOS app for a customer who wants to allow real-time (with minimum lag, max 50ms) conversations between users (a sort of Teamspeak). The lag must be low because the audio can also be live music, played with instruments, so all the users need to synchronize. I need a server, which will request audio recordings to every client and send to others (and make them hear the same sound at the same time). HTTP is easy to manage/implement and easy to scale, but very low-performing because an average HTTP request takes > 50ms... (with a mid-level hardware), so I was thinking of TCP/UDP connections kept open between clients and server. But I have some questions:

If I develop the server in Python (using TwistedMatrix, for example), how are its performance ?
I can't develop the server in C++ because it is hard to manage (scalable) and to develop.
Anyone used Nodejs (which is easy to scale) to manage TCP/UDP connections?
If I use HTTP, will it be fast enough with Keep-Alive? Becuase usually the time required for an HTTP Request to be performed is > 50ms (because opening-closing connection is hard), and I want the total procedure to be less than that time.
The server will be running on a Linux machine.

And finally: which type of compression can you suggest me? I thought Ogg Vorbis would be nice, but if there's anything better (and can be used in iOS), I am open to changes.

Thank you, Umar.

La solution

First off, you are not going to get sub 50 ms latency. Others have tried this. See for example http://ejamming.com/ a service that attempts to do what you are doing, but has a musically noticeable delay over the line and is therefore, in the ears of many, completely unusable. They use special routing techniques to get the latency as low as possible and last I heard their service doesn't work with some router configurations.

Secondly, what language you use on server probably doesn't make much difference, as the delay from client to server will be worse than any delay caused by your service, but if I understand your service correctly, you are going to need a lot of servers (or server threads) just relaying audio data between clients or doing some sort of minimal mixing. This is a small amount of work per connection, but a lot of connections, so you need something that can handle that. I would lean towards something like Java, Scala, or maybe Go. I could be wrong, but I don't think this is a good use-case for node, which, as I understand it, does not do multithreading well at this time. Also, don't poo-poo C++, scalable services have been built C++. You could also build the relay part of the service in C++ and the rest in whatever.

Third, when choosing a compression format, you'll have to choose one that can survive packet loss if you plan to use UDP, and I think UDP is the only way to go for this. I don't think vorbis is up to this task, but I could be wrong. Off the top of my head, I'm not sure of anything that works on the iPhone and is UDP friendly, but I'm sure there are lots of things. Speex is an example and is open-source. Not sure if the latency and quality meet your needs.

Finally, to be blunt, I think there are som other things you should research a bit more. eg. DNS is usually cached locally and not checked every http call (though it may depend on the system/library. At least most systems cache dns locally). Also, there is no such protocol as TCP/UDP. There is TCP/IP (sometimes just called TCP) and UDP/IP (sometimes just called UDP). You seem to refer to the two as if they are one. The difference is very important for what you are doing. For example, HTTP runs on top of TCP, not UDP, and UDP is considered "unreliable", but has less overhead, so it's good for streaming.

Edit: speex

Autres conseils

What concerns the server, the request itself is not a bottleneck. I guess you have sufficient time to set up the connection, as it happens only in the beginning of the session. Therefore the protocol is not of much relevance.

But consider that HTTP is a stateless protocol and not suitable for audio streaming. There are a couple of real time streaming protocols you can choose from. All of them will work over TCP or UDP (e.g. use raw sockets), and there are plenty of implementations.

In your case, the bottleneck with latency is not the server but the network itself. The connection between an iOS device and a wireless access point (AP) eats up about 40ms if the AP is not misconfigured and connection is good. (ping your iPhone.) In total, you'd have a minimum of 80ms for the path iOS -> AP -> Server -> AP -> iOS. But it is difficult to keep that latency stable. (Typical latency of AirPlay on my local network is about 300ms.)

I think live music over iOS devices is not practicable today. Try skype between two iOS devices and look how close you can get to 50ms. I'd bet no one can do it significantly better, what concerns latency.

Update: New research result!

I have to revise my claims regarding the latency of wifi connections of the iDevice. Apparently when you first ping your device, latency will be bad. But if I ping again no later than 200ms after that, I see an average latency 2ms-3ms between AP and iDevice.

My interpretation is that if there is no communication between AP and iDevice for more than 200ms, the network adapter of the iDevice will go to a less responsive sleep mode, probably to save battery power.

So it seems, live music is within reach again... :-)

Update 2

The ping-interval required for keep alive of low latency apparently differs from device to device. The reported 200ms is for an 3rd gen. iPad. For my iPhone 4 it's more like 50ms.

While streaming audio you probably don't need to bother with this, as data is exchanged on a more frequent basis. In my own context, I have sparse communication between an iDevice and a server, but low latency is crucial. A keep alive therefore is the way to go.

Best, Peter

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow