Question

I need some help on conversion of speech to text using Android Speech API. The API is giving me correct results on my Device (Android Version 2.3.5) but when I tested it on Device Having Android Version 4.1.2, it is giving me abnormal results. Like the result is being repeated multiple times. If somebody have faced this problem can you tell me how to cater this issue ?

Following is the code I am using:

public class MainActivity extends Activity {

    protected static final int RESULT_SPEECH = 1;
    protected static final String TAG = "MY_TAG";

    private TextView spokenText;
    private Button spkButton;
    private Button stopButton;
    private SpeechRecognizer sR;
    private ClickListener clickListener;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);


        clickListener = new ClickListener();
        spokenText = (TextView) findViewById(R.id.spokenText);
        spokenAnswer = (TextView) findViewById(R.id.spokenAnswer);
        spkButton = (Button) findViewById(R.id.speakButton);
        stopButton = (Button) findViewById(R.id.stopButton);

        spkButton.setOnClickListener(clickListener);
        stopButton.setOnClickListener(clickListener);

        sR = SpeechRecognizer.createSpeechRecognizer(this);
        sR.setRecognitionListener(new listener());
    }

    @Override
    public boolean onCreateOptionsMenu(Menu menu) {
        // Inflate the menu; this adds items to the action bar if it is present.
        getMenuInflater().inflate(R.menu.main, menu);
        return true;
    }

    public void startListening()
    {
        Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
        intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
        intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, "en-US");
        intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS,1); 
        intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE,this.getPackageName());
        sR.startListening(intent);
    }

    public void stopListening()
    {
        sR.stopListening();
    }

    class ClickListener implements OnClickListener
    {

        @Override
        public void onClick(View v) {
            // TODO Auto-generated method stub
            if(v == spkButton)
            {
                startListening();
            }
            else if(v == stopButton)
            {
                stopListening();
            }

        }

    }
    class listener implements RecognitionListener{

        @Override
        public void onRmsChanged(float rmsdB) {
            // TODO Auto-generated method stub
            //Log.d(TAG, "onRmsChanged");
        }

        @Override
        public void onResults(Bundle results) {
            // TODO Auto-generated method stub
            String str = new String();
            //Log.d(TAG, "onResults " + results);
            ArrayList<String> data = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
            for (int i = 0; i < data.size(); i++)
            {
                      //Log.d(TAG, "result " + data.get(i));
                      str += data.get(i);
            }
            Log.d(TAG, str);
            spokenText.setText(str); 
        }

        @Override
        public void onReadyForSpeech(Bundle params) {
            // TODO Auto-generated method stub
            //Log.d(TAG, "onReadyForSpeech");
        }

        @Override
        public void onPartialResults(Bundle partialResults) {
            // TODO Auto-generated method stub
            //Log.d(TAG, "onPartialResults");
        }

        @Override
        public void onEvent(int eventType, Bundle params) {
            // TODO Auto-generated method stub
            //Log.d(TAG, "onEvent");
        }

        @Override
        public void onError(int error) {
            // TODO Auto-generated method stub
            String mError = "";
            switch (error) {
            case SpeechRecognizer.ERROR_NETWORK_TIMEOUT:                
                mError = " network timeout"; 
                break;
            case SpeechRecognizer.ERROR_NETWORK: 
                mError = " network" ;
                return;
            case SpeechRecognizer.ERROR_AUDIO: 
                mError = " audio"; 
                break;
            case SpeechRecognizer.ERROR_SERVER: 
                mError = " server"; 
                break;
            case SpeechRecognizer.ERROR_CLIENT: 
                mError = " client"; 
                break;
            case SpeechRecognizer.ERROR_SPEECH_TIMEOUT: 
                mError = " speech time out" ; 
                break;
            case SpeechRecognizer.ERROR_NO_MATCH: 
                mError = " no match" ; 
                break;
            case SpeechRecognizer.ERROR_RECOGNIZER_BUSY: 
                mError = " recogniser busy" ; 
                break;
            case SpeechRecognizer.ERROR_INSUFFICIENT_PERMISSIONS: 
                mError = " insufficient permissions" ; 
                break;

            }
            //Log.d(TAG,  "Error: " +  error + " - " + mError);
            //startListening();

        }

        @Override
        public void onEndOfSpeech() {
            // TODO Auto-generated method stub
            //Log.d(TAG, "onEndOfSpeech");
            //startListening();
        }

        @Override
        public void onBufferReceived(byte[] buffer) {
            // TODO Auto-generated method stub
            //Log.d(TAG, "onBufferReceived");
        }

        @Override
        public void onBeginningOfSpeech() {
            // TODO Auto-generated method stub
        }
    }
}

Following is the output i am seeing - The results are abnormal, it should have shown a single time rather that 3 times .. enter image description here

Was it helpful?

Solution

Here is a chunk of response from one of the google-speech-api on android. Note the JSON array in the 'hypothesis' field...

{"status":0,"id":"a4ca9654c6cc684dc3279cd1aaa00cc7-1","hypotheses":[{"utterance":"map of the state of California","confidence":0.87869847}]}

You need to know the details of the api's response body you are using and , if necessary , how to parse JSON arrays in the response like the 'hypothesis' field above.

If it is an array as i suspect it is , then you just need a little parsing of the array to get the proper response without the duplication issue.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top