سؤال

I try to create a scream and whistle detector in my android application. Now I can detect the user's whistle by using musicg android library here. I have to implement the scream detector by myself because there are no library available.

In musicg, it reads audio data and stores it in the buffer of bytes. It uses buffer as an input in its whistle detector. I try to see and understand how these bytes look like by print them in the LogCat. However, I have no idea about what they are and how the musicg library can use this input to detect when user is whistling.

The audio data bytes look like this. I use buffer[i] + "" to print them

10-25 23:43:54.412: E/1115(7542): 71 
10-25 23:43:54.412: E/1116(7542): 22
10-25 23:43:54.412: E/1117(7542): 58
10-25 23:43:54.412: E/1118(7542): -14
10-25 23:43:54.412: E/1119(7542): 36
10-25 23:43:54.412: E/1120(7542): 88
10-25 23:43:54.412: E/1121(7542): 8
10-25 23:43:54.413: E/1122(7542): -98
10-25 23:43:54.413: E/1123(7542): -24
10-25 23:43:54.413: E/1124(7542): 66
10-25 23:43:54.413: E/1125(7542): -51
10-25 23:43:54.413: E/1126(7542): 111
10-25 23:43:54.413: E/1127(7542): -67
10-25 23:43:54.413: E/1128(7542): 43
10-25 23:43:54.413: E/1129(7542): -68
10-25 23:43:54.413: E/1130(7542): 36
10-25 23:43:54.415: E/1131(7542): -58
10-25 23:43:54.415: E/1132(7542): -85
10-25 23:43:54.415: E/1133(7542): -46
10-25 23:43:54.415: E/1134(7542): 78
10-25 23:43:54.415: E/1135(7542): -40

So, can anyone tell me how this input can be used to detect the user's whistle.

Please give me some ideas

Thank you

هل كانت مفيدة؟

المحلول

The stream of bytes is PCM audio. Each byte in the array is how loud the sound is at any specific instant of time. Audio processing is usually done in chunks. For example, in the lirbary you're using, the WaveTypeDetector class is looping through chunks of bytes and performing an FFT on each chunk to determine pitches.

A single instant of time in audio doesn't tell you anything about the frequency of the sound (the pitch). To do useful analysis of sound, a chunk of audio like this array is required.

The FFT outputs a function of sound level versus frequency for the chunk of time represented by the array of bytes. This can be used to detect which pitches in the sound are the loudest, for example.

And when the method is performed repeatedly on a series of chunks of sound, the library can compare how pitches change over time to determine what kind of sound (whistling or clapping) is being played, base on known patterns in pitches that these types of sounds make.

This library is performing an analysis on a series of chunks of sound. For each chunk, it determines if the sound fits a certain set of criteria (for example, is it within a certain range of frequency and a certain range of intensity). Then it repeats this for the entire length of the sound file to and divides the boolean responses of the WhistleApi class to output a probability that the entire sound file as a whole is a whistle.

To create scream detection, I think you will need to extend the DetectionApi class to create a ScreamApi class, looking at the WhistleApi class as an example. Then you will have to come up with your own criteria values to replace the ones used by WhistleApi.

To come up with your own scream criteria, you can make a few dozen recordings of the different types of screams you think should be acceptable. I would crop off the beginning and ends of the files so there is no silence in them. Then temporarily modify the DetectionApi class to log the max and min values it reads for each of these criteria. You'll get a long series of values for each file, so you can put them in Excel to get a mean and standard deviation. For each criterion, I would use something like the mean +/- 3 standard deviations for the min and max values of that criterion. Compare these values across all your sound files to adjust them, possibly throwing out outlier files.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top