Question

I'm recording using WaveAPI, and I want to detect after I'm finish recording the data, if the buffer has sound in it, or it have recorded nothing (just the void of the room).

I wrote a function that gets an average of the absolute value of the buffer, and it works "Ok", but it has many problems :

1) I've detected, that when it is void, the average is ~860, and when I'm talking, it is ~875, which is barely a different at all. How can it be ? I'm recording for 1 sec.

2) Some times, I see that the average is ~860, some times ~500, some days even ~400. Why is it changing every time? I mean, shouldent it be the same, as in all the times it captures the void and there isnt a change ?

Here is the function I wrote :

bool isEmpty(short int *wave)
{
int avg = 0;

for (int i = 0 ; i < NUMPTS ; i++)
{
    if (wave[i] < 0)
        avg = avg + (wave[i]) * -1;

    else
        avg = avg + (wave[i]);
}

avg = avg / NUMPTS;

if (avg > avg_voice)
    return false;

return true;
}

This function isnt good enough, as it isnt allways right, and I have to constatly change the avg_voice into something else, and some times the buffer is like only 10 points in the average higher with sound, than with void, which is very hard to detect wheter it has voice in it or not....

So what can I do ? How can I improve it ? Maybe theres an option for that when I record the voice, and fill in all the WAVEFORMATEXand WAVEHDR settings?

Thanks!

Edit: wave is a short int array that contains 8000 cells, and stors the voice inside, and looks like this (example) : wave[0] = -123; wave[1] = -205; wave[2] = -212'

and such...

Second Edit : I'm recording the data like this :

void StartRecord()
{
short int *waveIn = new short int[NUMPTS];

HWAVEIN hWaveIn;
WAVEHDR WaveInHdr;
MMRESULT result;
HWAVEOUT hWaveOut;

WAVEFORMATEX pFormat;
pFormat.wFormatTag = WAVE_FORMAT_PCM;
pFormat.nChannels = 1;
pFormat.nSamplesPerSec = sampleRate;
pFormat.nAvgBytesPerSec = 2 * sampleRate;
pFormat.nBlockAlign = 2;
pFormat.wBitsPerSample = 16;
pFormat.cbSize = 0;

result = waveInOpen(&hWaveIn, WAVE_MAPPER, &pFormat, 0, 0, WAVE_FORMAT_DIRECT);

if(result)
{
    char fault[256];
    waveInGetErrorTextA(result, fault, 256);
    MessageBoxA(NULL, fault, "Failed to open waveform input device.", MB_OK | MB_ICONEXCLAMATION);
    return;
}

WaveInHdr.lpData = (LPSTR)waveIn;
WaveInHdr.dwBufferLength = 2 * NUMPTS;
WaveInHdr.dwBytesRecorded = 0;
WaveInHdr.dwUser = 0;
WaveInHdr.dwFlags = 0;
WaveInHdr.dwLoops = 0;

while (true)
{
    waveInPrepareHeader(hWaveIn, &WaveInHdr, sizeof(WAVEHDR));
    result = waveInAddBuffer(hWaveIn, &WaveInHdr, sizeof(WAVEHDR));

    result = waveInStart(hWaveIn);
    if(result)
    {
        MessageBoxA(NULL, "Failed to start recording", NULL, MB_OK | MB_ICONEXCLAMATION);
        return;
    }

    // Wait until finished recording 
    Sleep(seconds * 1000); //Sleep for as long as there was recorded
    waveInUnprepareHeader(hWaveIn, &WaveInHdr, sizeof(WAVEHDR));

    if (isEmpty(waveIn)) // Checks here
                 .....
}
 }
Was it helpful?

Solution

Firstly, I predict that the buffer hasn't been filled by the time you analyze it. Rather than a simple sleep, you should poll for WaveInHdr.dwFlags for the WHDR_DONE bit to be set.

result = waveInStart(hWaveIn);
if(result)
{
    MessageBoxA(NULL, "Failed to start recording", NULL, MB_OK | MB_ICONEXCLAMATION);
    return;
}

// Wait until finished recording 
while ((WaveInHdr.dwFlags & WHDR_DONE) == 0)
    Sleep(100);

Secondly, I'd suggest a better method of measuring loudness. RMS Perhaps:

double Rms(short int *wave, int length)
{
    double sumSquared = 0;
    double scaleShortToDouble = 1.0/0x8000;

    for (int i = 0 ; i < length; i++)
    {
         double s = wave[i] * scaleShortToDouble;
         sumSquared += s * s;
    }
    return sqrt(2) * sqrt(sumSquared/length);
}

I've converted the shorts to doubles in the range of -1.0 to 1.0 because its easier to compute with. The extra sqrt(2) is going to scale the result so that if you were to put a sine wave into the A/D converter so that a full scale digital sine comes out (-32768,32767), the Rms result will be 1.0.

With that done, you can now convert the Rms value to dB and you'll have a number that is referred to as dBFS and is commonly used when talking about digital levels.

The conversion is: dBFS = 20*log10(rms) and roughly:

  • 0 dBFS = 1.0`
  • -6 dBFS = 0.5
  • -12 dBFS = 0.25

each halving of input level is another -6 dBFS down.

It also happens that each halving of the input signal is going to require one less bit of the A/D converter. Since you have a 16 bit signal, you're theoretical noise floor is going to be at around -96 dBFS. In practice though, since you have a mic hooked up, it's going to be somewhat higher than that - depending in large part upon the quality of your setup. And that's where you're going to need to experiment.

OTHER TIPS

you must use RMS as sinusoids have an average value of 0 so you will just get the voltage offset of the microphone if you take an average. That's why you are getting inconsistent but low values, 860/2^15 is approximately 2% of the dynamic range.

You have allocated memory for waveIn using:

short int *waveIn = new short int[NUMPTS];

However, that does not initialize the contents. Initialize the contents into something meaningful. Then, you will be able to see where things are not working. If 0 is meaningful default value, use:

for (int i = 0; i < NUMPTS; ++i )
{
   waveIn[i] = 0;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top