Sorry for the length of the post. I want to illustrate what I have tried and what I am trying to accomplish.
Essentially what I am trying to do is write a VOIP network tester in C#. I've written all the VOIP code using the Ozeki VOIP SIP C# SDK. Essentially what it does is the client makes a VOIP call which the server side picks up. The client plays a WAV file and the server side records it. I've generated a tone file from audiocheck.net. I generated a 3000Hz wav file with a sine waveform for 5 seconds at a sample rate of 8000Hz and 16bit. This is what the client plays. I chose the frequency arbitrarily, so that can always change. What I want to do is then have the server side to do a simple analysis on the file to determine the amount of noise, which can be introduced through packet loss, latency, etc.
AudioProcessor.cs is a C# class that opens a WAV file and reads the header information. Since the file is a 16-bit wave, I use a "two complement" (thanks to http://www.codeproject.com/Articles/19590/WAVE-File-Processor-in-C) to read each 2-byte frame in to an array. For example I have:
0:0
1:-14321
2:17173
3:-9875
4:0
5:9875
6:-17175
7:14319
8:0
9:-14321
10:17173
11:-9875
The code is:
Console.WriteLine("Audio: Filename: " + fileName);
FileStream stream = File.Open(fileName, FileMode.Open, FileAccess.Read);
BinaryReader reader = new BinaryReader(stream);
int chunkID = reader.ReadInt32();
int fileSize = reader.ReadInt32();
int riffType = reader.ReadInt32();
int fmtID = reader.ReadInt32();
int fmtSize = reader.ReadInt32();
int fmtCode = reader.ReadInt16();
int channels = reader.ReadInt16();
int sampleRate = reader.ReadInt32();
int fmtAvgBPS = reader.ReadInt32();
int fmtBlockAlign = reader.ReadInt16();
int bitDepth = reader.ReadInt16();
if (fmtSize == 18)
{
// Read any extra values
int fmtExtraSize = reader.ReadInt16();
reader.ReadBytes(fmtExtraSize);
}
int dataID = reader.ReadInt32();
int dataSize = reader.ReadInt32();
Console.WriteLine("Audio: file size: " + fileSize.ToString());
Console.WriteLine("Audio: sample rate: " + sampleRate.ToString());
Console.WriteLine("Audio: channels: " + channels.ToString());
Console.WriteLine("Audio: bit depth: " + bitDepth.ToString());
Console.WriteLine("Audio: fmtAvgBPS: " + fmtAvgBPS.ToString());
Console.WriteLine("Audio: data id: " + dataID.ToString());
Console.WriteLine("Audio: data size: " + dataSize.ToString());
int frames = 8 * (dataSize / bitDepth) / channels;
int frameSize = dataSize / frames;
double timeLength = ((double)frames / (double)sampleRate);
Console.WriteLine("Audio: frames: " + frames.ToString());
Console.WriteLine("Audio: frame size: " + frameSize.ToString());
Console.WriteLine("Audio: Time length: " + timeLength.ToString());
// byte[] soundData = reader.ReadBytes(dataSize);
// Convert to two-complement
short[] frameData = new short[frames];
for (int i = 0; i < frames; i++)
{
short snd = reader.ReadInt16();
if (snd != 0)
snd = Convert.ToInt16((~snd | 1));
frameData[i] = snd;
}
The next step would be to calculate the amount of noise, or rather how much non-3000Hz signal is there. Based on research I initially tried using a Goertzel filter to detect a particular frequency. It appears to be used a lot to detect phone DTMF. This method is an implementation which I tried.
public static double Calculate(short[] samples, double freq)
{
double s_prev = 0.0;
double s_prev2 = 0.0;
double coeff,normalizedfreq,power,s;
int i;
normalizedfreq = freq / (double)SAMPLING_RATE;
coeff = 2.0*Math.Cos(2.0*Math.PI*normalizedfreq);
for (i=0; i<samples.Length; i++)
{
s = samples[i] + coeff * s_prev - s_prev2;
s_prev2 = s_prev;
s_prev = s;
}
power = s_prev2*s_prev2+s_prev*s_prev-coeff*s_prev*s_prev2;
return power;
}
I would call the function passing in a 1 second sample:
short[] sampleData = new short[4000];
Array.Copy(frameData,sampleData,4000);
for (int i = 1; i < 11; i++)
{
Console.WriteLine(i * 1000 + ": " + Goertzel2.Calculate(sampleData, i * 1000));
}
The output is:
1000: 4297489869.04579
2000: 19758026000000
3000: 1.17528628051013E+15
4000: 0
5000: 1.17528628051013E+15
6000: 19758026000000
7000: 4297489869.04671
8000: 4000000
9000: 4297489869.04529
10000: 19758026000000
3000Hz seems to have the biggest number, but so does 5000. I have no idea of whether these numbers are accurate or not. If this was to work, I would run this against smaller samples, say 1/10 s in an attempt to detect variations which I would interpret as noise.
I've also looked at notch filter or a FFT. I'm not really sure what the best step is next. I don't need anything complex. I just want to roughly be able to calculate how much of the output wav file is noise. As mentioned I'm writing this in C#, but I can port code from C, C++, Python and Java.
Edit: Here is my updated code.
Calculates the total power at each frequency
// Number of frequencies that are half of the sample rate to scan
int _frequencyGranularity = 2000;
// Number of frames to use to create a sample for the filter
int _sampleSize = 4000;
int frameCount = 0;
while(frameCount + _sampleSize < frameData.Length)
{
// Dictionary to store the power level at a particular frequency
Dictionary<int, double> vals = new Dictionary<int, double>(_frequencyGranularity);
double totalPower = 0;
for (int i = 1; i <= _frequencyGranularity; i++)
{
// Only process up to half of the sample rate as this is the Nyquist limit
// http://stackoverflow.com/questions/20864651/calculating-the-amount-of-noise-in-a-wav-file-compared-to-a-source-file
int freq = i * wave.SampleRate / 2 / _frequencyGranularity;
vals[freq] = Goertzel.Calculate(frameData, frameCount, _sampleSize, wave.SampleRate, freq);
totalPower += vals[freq];
}
// Calculate the percentange of noise by subtracting the percentage of power at the desided frequency of 3000 from 100.
double frameNoisePercentange = (100 - (vals[3000] / totalPower * 100));
logger.Debug("Frame: " + frameCount + " Noise: " + frameNoisePercentange);
noisePercentange += frameNoisePercentange;
frameCount += _sampleSize;
}
double averageNoise = (noisePercentange / (int)(frameCount/_sampleSize));
Updated Goertzel method
public static double Calculate(short[] sampleData, int offset, int length, int sampleRate, double searchFreq)
{
double s_prev = 0.0;
double s_prev2 = 0.0;
double coeff,normalizedfreq,power,s;
int i;
normalizedfreq = searchFreq / (double)sampleRate;
coeff = 2.0*Math.Cos(2.0*Math.PI*normalizedfreq);
for (i=0; i<length; i++)
{
s = sampleData[i+offset] + coeff * s_prev - s_prev2;
s_prev2 = s_prev;
s_prev = s;
}
power = s_prev2*s_prev2+s_prev*s_prev-coeff*s_prev*s_prev2;
return power;
}