Question

I need to determine when someone speaks in an audio stream. I applied the Hamming window and calculated the FFT. How do i detect the human voice from here?

Was it helpful?

Solution

You don't need to do an FFT for this, you need to implement a Voice Activity Detection algorithm.

OTHER TIPS

If you want to experiment with your own voice activity detection algorithms, an FFT can be used as an initial stage. Next you might want to try subtracting any characterized stationary spectral noise background. Then you could try using the modified FFT results to calculate a cepstrum (or some weighted cepstral coefficients) for feature extraction. You could then do some statistical pattern matching on whatever feature vectors you decided to extract, and feed the results to a decision algorithm.

Each of the above steps has likely been a research topic, and a good implementation might involve studying dozens of published research papers, which perhaps can be found in your university library.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top