calculating the amount of noise in a wav file compared to a source file

https://stackoverflow.com/questions/20864651

23-09-2022
|

Pergunta

Sorry for the length of the post. I want to illustrate what I have tried and what I am trying to accomplish.

Essentially what I am trying to do is write a VOIP network tester in C#. I've written all the VOIP code using the Ozeki VOIP SIP C# SDK. Essentially what it does is the client makes a VOIP call which the server side picks up. The client plays a WAV file and the server side records it. I've generated a tone file from audiocheck.net. I generated a 3000Hz wav file with a sine waveform for 5 seconds at a sample rate of 8000Hz and 16bit. This is what the client plays. I chose the frequency arbitrarily, so that can always change. What I want to do is then have the server side to do a simple analysis on the file to determine the amount of noise, which can be introduced through packet loss, latency, etc.

AudioProcessor.cs is a C# class that opens a WAV file and reads the header information. Since the file is a 16-bit wave, I use a "two complement" (thanks to http://www.codeproject.com/Articles/19590/WAVE-File-Processor-in-C) to read each 2-byte frame in to an array. For example I have:

The code is:

Console.WriteLine("Audio: Filename: " + fileName);
FileStream stream = File.Open(fileName, FileMode.Open, FileAccess.Read);
BinaryReader reader = new BinaryReader(stream);

int chunkID = reader.ReadInt32();
int fileSize = reader.ReadInt32();
int riffType = reader.ReadInt32();
int fmtID = reader.ReadInt32();
int fmtSize = reader.ReadInt32();
int fmtCode = reader.ReadInt16();
int channels = reader.ReadInt16();
int sampleRate = reader.ReadInt32();
int fmtAvgBPS = reader.ReadInt32();
int fmtBlockAlign = reader.ReadInt16();
int bitDepth = reader.ReadInt16();

if (fmtSize == 18)
{
    // Read any extra values
    int fmtExtraSize = reader.ReadInt16();
    reader.ReadBytes(fmtExtraSize);
}

int dataID = reader.ReadInt32();
int dataSize = reader.ReadInt32();

Console.WriteLine("Audio: file size: " + fileSize.ToString());
Console.WriteLine("Audio: sample rate: " + sampleRate.ToString());
Console.WriteLine("Audio: channels: " + channels.ToString());
Console.WriteLine("Audio: bit depth: " + bitDepth.ToString());
Console.WriteLine("Audio: fmtAvgBPS: " + fmtAvgBPS.ToString());
Console.WriteLine("Audio: data id: " + dataID.ToString());
Console.WriteLine("Audio: data size: " + dataSize.ToString());

int frames = 8 * (dataSize / bitDepth) / channels;
int frameSize = dataSize / frames;
double timeLength = ((double)frames / (double)sampleRate);

Console.WriteLine("Audio: frames: " + frames.ToString());
Console.WriteLine("Audio: frame size: " + frameSize.ToString());
Console.WriteLine("Audio: Time length: " + timeLength.ToString());

// byte[] soundData = reader.ReadBytes(dataSize);

// Convert to two-complement
short[] frameData = new short[frames];
for (int i = 0; i < frames; i++)
{
    short snd = reader.ReadInt16();
    if (snd != 0)
        snd = Convert.ToInt16((~snd | 1));
    frameData[i] = snd;
}

The next step would be to calculate the amount of noise, or rather how much non-3000Hz signal is there. Based on research I initially tried using a Goertzel filter to detect a particular frequency. It appears to be used a lot to detect phone DTMF. This method is an implementation which I tried.

public static double Calculate(short[] samples, double freq)
{
    double s_prev = 0.0;
    double s_prev2 = 0.0;    
    double coeff,normalizedfreq,power,s;
    int i;
    normalizedfreq = freq / (double)SAMPLING_RATE;
    coeff = 2.0*Math.Cos(2.0*Math.PI*normalizedfreq);
    for (i=0; i<samples.Length; i++)
    {
        s = samples[i] + coeff * s_prev - s_prev2;
        s_prev2 = s_prev;
        s_prev = s;
    }
    power = s_prev2*s_prev2+s_prev*s_prev-coeff*s_prev*s_prev2;
    return power;
}

I would call the function passing in a 1 second sample:

short[] sampleData = new short[4000];
Array.Copy(frameData,sampleData,4000);
for (int i = 1; i < 11; i++)
{
    Console.WriteLine(i * 1000 + ": " + Goertzel2.Calculate(sampleData, i * 1000));
}

The output is:

1000: 4297489869.04579
2000: 19758026000000
3000: 1.17528628051013E+15
4000: 0
5000: 1.17528628051013E+15
6000: 19758026000000
7000: 4297489869.04671
8000: 4000000
9000: 4297489869.04529
10000: 19758026000000

3000Hz seems to have the biggest number, but so does 5000. I have no idea of whether these numbers are accurate or not. If this was to work, I would run this against smaller samples, say 1/10 s in an attempt to detect variations which I would interpret as noise.

I've also looked at notch filter or a FFT. I'm not really sure what the best step is next. I don't need anything complex. I just want to roughly be able to calculate how much of the output wav file is noise. As mentioned I'm writing this in C#, but I can port code from C, C++, Python and Java.

Edit: Here is my updated code.

Calculates the total power at each frequency

// Number of frequencies that are half of the sample rate to scan
int _frequencyGranularity = 2000;
// Number of frames to use to create a sample for the filter
int _sampleSize = 4000;
int frameCount = 0;
while(frameCount + _sampleSize < frameData.Length)
{
    // Dictionary to store the power level at a particular frequency
    Dictionary<int, double> vals = new Dictionary<int, double>(_frequencyGranularity);
    double totalPower = 0;
    for (int i = 1; i <= _frequencyGranularity; i++)
    {
        // Only process up to half of the sample rate as this is the Nyquist limit
        // http://stackoverflow.com/questions/20864651/calculating-the-amount-of-noise-in-a-wav-file-compared-to-a-source-file
        int freq = i * wave.SampleRate / 2 / _frequencyGranularity;
        vals[freq] = Goertzel.Calculate(frameData, frameCount, _sampleSize, wave.SampleRate, freq);
        totalPower += vals[freq];
    }

    // Calculate the percentange of noise by subtracting the percentage of power at the desided frequency of 3000 from 100.
    double frameNoisePercentange = (100 - (vals[3000] / totalPower * 100));
    logger.Debug("Frame: " + frameCount + " Noise: " + frameNoisePercentange);
    noisePercentange += frameNoisePercentange;
    frameCount += _sampleSize;
}
double averageNoise = (noisePercentange / (int)(frameCount/_sampleSize));

Updated Goertzel method

public static double Calculate(short[] sampleData, int offset, int length, int sampleRate, double searchFreq)
{
    double s_prev = 0.0;
    double s_prev2 = 0.0;    
    double coeff,normalizedfreq,power,s;
    int i;
    normalizedfreq = searchFreq / (double)sampleRate;
    coeff = 2.0*Math.Cos(2.0*Math.PI*normalizedfreq);
    for (i=0; i<length; i++)
    {
        s = sampleData[i+offset] + coeff * s_prev - s_prev2;
        s_prev2 = s_prev;
        s_prev = s;
    }
    power = s_prev2*s_prev2+s_prev*s_prev-coeff*s_prev*s_prev2;
    return power;
}

Solução

One way to build a crude estimation of the noise would be to compute the standard deviation of the peak values of the signal.

Given that you know the expected frequency, you can divide the signal into chunks of one wavelength, i.e if your signal is a 3KHz and your sample rate is 16KHz, then your chunk size is 5.3333 samples, for each chunk find the highest value, then for that sequence of values, find the stddev.

Alternatively you can for each chunk track the min and max values, then over the whole sample, find the mean of the min and max, and the range for the min (i.e. the highest and lowest values of the min value) then the SNR is ~ (mean_max - mean_min) / (min_range)

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow