سؤال

I want to build an app that responds to the sound you make when blowing out birthday candles. This is not speech recognition per se (that sound isn't a word in English), and the very kind Halle over at OpenEars told me that it's not possible using that framework. (Thanks for your quick response, Halle!)

Is there a way to "teach" an app a sound such that the app can subsequently recognize it?

How would I go about this? Is it even doable? Am I crazy or taking on a problem that is much more difficult than I think it is? What should my homework be?

هل كانت مفيدة؟

المحلول

The good news is that it's achievable and you don't need any third party frameworks—AVFoundation is all you really need.

There's a good article from Mobile Orchard that covers the details, but somewhat inevitably for a four year old, there's some gotchas you need to be aware of.

  1. Before you begin recording on a real device, I had need to set the audio session category, like so:

    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayAndRecord error:nil];
    
  2. Play around with the threshold in this line:

    if (lowPassResults > 0.95)
    

    I found 0.95 to be too high and got better results setting it somewhere between 0.55 and 0.75. Similarly, I played around with the 0.05 multiplier in this line:

    double peakPowerForChannel = pow(10, (0.05 * [recorder peakPowerForChannel:0]));
    

نصائح أخرى

Using simple thresholds on energy levels would probably not be robust enough for your use case.

A good way to go about this would be to first extract some properties from the sound stream that are specific to the sound of blowing out candles. Then use a machine learning algorithm to train a model based on training examples (a set of recordings of the sound you want to recognize), which can then be used to classify snippets of sound coming into your microphone in real-time when using the application.

Given the possible environmental sounds going on while you blow out candles (birthdays are always noisy, aren't they?), it may be difficult to train a model that is robust enough to these background sounds. This is not a simple problem if you care about accuracy.

It may be doable though: Forgive me the self-promotion, but my company developed an SDK that provides an answer to the question you are asking: "Is there a way to "teach" an app a sound such that the app can subsequently recognize it?" I am not sure if the specific sound of blowing out candles would work, as the SDK was primarily aimed at applications involving somewhat percussive sounds, but it might still work for your case. Here is a link, where you will also find a demo program you can download and try if you like: SampleSumo PSR SDK

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top