Live voice chat with manipulated voice audio with RTMFP

https://stackoverflow.com/questions/9886794

26-05-2021
|

質問

We are building a live RTMFP voice chat application with Cumulus. While the basic voice transfer works pretty easy using NetStreams, we have one big problem:

There does not seem to be a way to manipulate the microphone data that the NetStream sends, and also not a way to manipulate the data the listening NetStream receives before it is played.

However, this is exactly what we need. We do not want to transfer the normal microphone recorded audio, but first pitch it, then send it, then play it. Or first send it, then pitch it, then play it. But it seems that the whole audio recording, speex encoding, speex decoding and audio playback is completely encapsuled within the NetStream class.

The only ways to achieve what we want (and all of them removing NetStream completely) seem to be:

Send raw pitched audio data. That does work, but is of course a lot of data to send and will likely not work fast enough outside of our local LAN testing.
Pitch audio data, convert to ogg/mp3 using existing encoders for flash, send, decode ogg/mp3 and play. But this would mean encoding each and every sample packet that is received from the microphone, adding header stuff, etc. So this would likely not even yield that much of a benefit compared to raw audio data.

2.1. This would actually be a good way if there was a Speex encoder/decoder for flash. But ironically, there is none other than the built in one (which is used for encoding/decoding audio in NetStreams) that cannot be explicitly used. Yeah, thanks a lot for not offering it, Adobe...
Send the data to the Cumulus server, pitch (and probably convert) there and send to the recipient. This would likely not even be that much faster than 1. and also throw away the exact benefit of RTMFP, P2P communication.

Is there any solution to this problem that would work better than the ones I listed here, possibly a way to actually manipulate the microphone data before it is passed to the NetStream?

解決

To get something of viable, audio data have to be converted in a compressed format, raw data represents a huge amount of data. I think that the second choice is the better ;-)

I have already developped a ogg vorbis decoder/encoder in flash, in using Alchemy, it consumed always less of 10% of CPU! It's completly possible.

If you prefer speex format, I think that with a consistent effort it's possible to get the same thing, in building the speex code with alchemy.

If I can give you more further, contact me to cumulus.dev@gmail.com ;-)

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow