سؤال

I'm looking for advices, for a personal project.

I'm attempting to create a software for creating customized voice commands. The goal is to allow user/me to record some audio data (2/3 secs) for defining commands/macros. Then, when the user will speak (record the same audio data), the command/macro will be executed. The software must be able to detect a command in less than 1 second of processing time in a low-cost computer (RaspberryPi, for example).

I already searched in two ways : - Speech Recognition (CMU-Sphinx, Julius, simon) : There is good open-source solutions, but they often need large database files, and speech recognition is not really what I'm attempting to do. Speech Recognition could consume too much power for a small feature. - Audio Fingerprinting (Chromaprint -> http://acoustid.org/chromaprint) : It seems to be almost what I'm looking for. The principle is to create fingerprint from raw audio data, then compare fingerprints to determine if they can be identical. However, this kind of software/library seems to be designed for song identification (like famous softwares on smartphones) : I'm trying to configure a good "comparator", but I think I'm going in a bad way.

Do you know some dedicated software or parcel of code doing something similar ?

Any suggestion would be appreciated.

هل كانت مفيدة؟

المحلول 2

Song fingerprint is not a good idea for that task because command timings can vary and fingerprint expects exact time match. However its very easy to implement matching with DTW algorithm for time series and features extracted with CMUSphinx library Sphinxbase. See Wikipedia entry about DTW for details.

http://en.wikipedia.org/wiki/Dynamic_time_warping

http://cmusphinx.sourceforge.net/wiki/download

نصائح أخرى

I had a more or less similar project in which I intended to send voice commands to a robot. A speech recognition software is too complicated for such a task. I used FFT implementation in C++ to extract Fourier components of the sampled voice, and then I created a histogram of major frequencies (frequencies at which the target voice command has the highest amplitudes). I tried two approaches:

  1. Comparing the similarities between histogram of the given voice command with those saved in the memory to identify the most probable command.

  2. Using Support Vector Machine (SVM) to train a classifier to distinguish voice commands. I used LibSVM and the results are considerably better than the first approach. However, one problem with SVM method is that you need a rather large data set for training. Another problem is that, when an unknown voice is given, the classifier will output a command anyway (which is obviously a wrong command detection). This can be avoided by the first approach where I had a threshold for similarity measure.

I hope this helps you to implement your own voice activated software.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top