Question

How can a server running audio conference find who is the active speaker/speakers? I want to show an icon next to users who are currently speaking and show the video of the mos active speaker.

I think I need something like:

  • Calculate a score for each use based on audio energy/power/levels.
  • Normalize the score between all users.
  • Calculate the score on several audio frames to prevent rapid changes.

Do I need to do the calculation on the raw audio or is it possible to get the score from the encoded packets (speex/opus)? Is there a way to extract this info from the protocol transferring the audio (RTMP or SDP)?

In FreeSwitch there is a status field for each participant with flags for talking and floor owner. This is the code that calculates the score but I can't understand how it actually work.

Thanks

Was it helpful?

Solution

Usually the participant who sends the loudest audio stream is considered the 'active speaker'.

So you have to calculate the volume for each audio stream. How to measure the volume depends on the encoding of your audio stream. Check out this question on how to calculate the volume for PCM audio.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top