Question

I am developing a software which can auto record and extract every words in my voice. I used portaudio library to solve it. But I am stuck on detecting the sound: I set the silence's value is zero so if there is a sample which is zero, it must be a start or end point of a sound. But when I ran it, the program created many words. I think because the value I read by portaudio is raw data, so it can't be processed like that. Am I right? How can I fix it? By the way, I am coding in C++ :D

Était-ce utile?

La solution

To detect the presence of a signal in a PCM stream you be able to detect it. As dprogramz put said, the noise floor of your soundcard is probably not perfect and so there will be some noise signal recorded (even with no mic connected).

The solution is to use a VOX or VAD algorithm to detect the presence of your voice. VOX can be tricky, since in most consumer grade electronics the noise floor is just low enough to be "silence" to the human ear, relative to the signal. This means that the difference on amplitude between the noise floor and signal may be slight. If your sound card has AGC turned on this can make it even more difficult, since the noise floor may move. Having said that, VOX can be implemented successfully on consumer grade equipment. It just takes more effort to establish the threshold. When done best the threshold is calculated periodically while the stream is active.

If I were doing this I'd implement a VAD algorithm. Since your objective is to detect your voice this should provide a reliable result regardless of the equipment you use.

Autres conseils

I don't think it's because it is a RAW value. RAW sound files are a bitstream of frequency and volume information.

However, the value will rarely (if ever) be zero. You have to take into account there is a small amount of electrical noise that is made by the mic. Figure out the "idle" dB of your mic (just test the level when you aren't talking into it). You Then need to set a silence threshold (below a certain dB level for a certain number of samples) to detect the beginning/end. Attempting to detect a zero value is gonna be near impossible.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top