Kinect Speech Recognition only recognizing one grammar rule

https://stackoverflow.com/questions/23597969

20-07-2023
|

Question

I'm currently developing an application of Speech Recognition using Microsoft Kinect SDK. The goal of the application is to load any (valid) XML file containing the grammar and use it to process speech. For some reason that I haven't understood yet, the application only seems to recognize all the words belonging to the first rule within the XML grammar file. For example, in the following grammar:

<grammar version="1.0" xml:lang="en-US" root="rootRule" tag-format="semantics/1.0-literals" xmlns="http://www.w3.org/2001/06/grammar">
<rule id="rootRule">
<one-of>
  <item>
    <tag>PEOPLE</tag>
    <one-of>
      <item> team </item>
      <item> kara </item>
      <item> john </item>
      <item> george </item>
    </one-of>
  </item>
  <item>
    <tag>FOOD</tag>
    <one-of>
      <item> apple </item>
      <item> banana </item>
    </one-of>
  </item>
</one-of>
</rule>
<rule id="anotherRule">
<one-of>
  <item>
    <tag>COMMANDS</tag>
    <one-of>
      <item> close </item>
      <item> shut down </item>
      <item> stop the application </item>
    </one-of>
  </item>
  <item>
    <tag>TOYS</tag>
    <one-of>
      <item> doll </item>
      <item> teddy bear </item>
    </one-of>
  </item>
  </one-of>
 </rule>
</grammar>

The application will only recognize the words belonging to the rule id "rootRule", ignoring all the ones within the rule id "anotherRule". Why does this happen? I don't manually process the XML file, the SDK already does that, I only supply the location of file using:

spRecEng.LoadGrammar(new Grammar(filename));

And it works fine for the first rule so in theory it should work for all the following ones?!

I'm developing my application based on a already existing one (both have this exact same problem) and its source code can be found at: https://dl.dropboxusercontent.com/u/28555145/KinectForWindowsSpeech.rar

Solution

You specified your root rule in grammar element root="rootRule":

 <grammar version="1.0" xml:lang="en-US" root="rootRule" tag-format="semantics/1.0-literals" xmlns="http://www.w3.org/2001/06/grammar">

so it takes rootRule as a base. You can construct alternatives on top if you need alternatives. Second rule could be referenced by a first rule and used in recognition too, for example see here:

http://msdn.microsoft.com/en-us/library/hh362887(v=office.14).aspx

But there is a single entry point of the grammar which goes to recognition. This is how engine works.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow