Efficient method for checking quality of a sound recording

https://stackoverflow.com/questions/18242352

24-06-2022
|

سؤال

We have various wave files from live uncontrolled recordings that come in from one of our server side processes and most of them have good clear speech throughout. However, sometimes they are garbled, they have static, or the speech volume isn't loud enough. Is there an efficient method for determining if a recording is deemed "good" quality using C#?

I thought about taking the spectogram of a known good recording and comparing to the spectogram of a bad recording but the recordings will have different speech each time so this might not work. I've looked into libraries like Bass.Net and NAudio, but audio processing is not my field of expertise.

I could try comparing audio fingerprints, but I'm not entirely sure how this works. I saw that someone was attempting to compare two audio files using their audio fingerprint hashes and the Levenshtein Distance algorithm to find the degree of similarity between the two audio files. Unless the hashes produced by audio fingerprinting are similar between similar audio files, this method won't work.

Another thought of mine was to use some sort of speech recognition API for attempting to process speech and write a transcript of the audio to a text file. The problem is that speech recognition isn't extremely accurate and APIs like Microsoft's Speech API may still try to recognize speech even in a garbled recording or one with a bunch of static. I saw that Nuance has an SDK version of their speech recognition software, but I haven't had a chance to look at the SDK yet since they don't seem to offer a trial version of the SDK on their website.

المحلول

You can use existing open source tools to measure SNR for noisy speech. For details see http://labrosa.ee.columbia.edu/projects/snreval/

I recommend you to try WADA SNR

http://www.cs.cmu.edu/~robust/archive/algorithms/WADA_SNR_IS_2008/

It's pretty simple algorithm but it's not trivial to design it by yourself.

Fingerprinting and ASR doesn't work for sure since they try to eliminate noise not to detect it.

نصائح أخرى

I am also searching for a solution for a similar problem and I found this open source project: https://github.com/dpwe/audfprint. You can create a database and then compare your query(the audio the quality of you're not sure) against the database.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow