How do you efficiently create a grammar file for speech recognition given a large list of words?

https://stackoverflow.com/questions/653262

speech-recognition

19-08-2019
|

Question

It's easy to write a grammar file for speech recognition from only 50 words because you can just do it manually. What is the easiest, most efficient way to do it if you have 10,000 or 100,000 words?

Example:
Say we have "RC cola" and "Pepsi cola". We would have grammar file consisting of 2 rules:
DRINK: (COLANAME ?[coke cola soda])
COLANAME: [rc pepsi]
It will recognizes "RC","RC Coke","RC Cola","RC Soda", "Pepsi", "Pepsi Coke", "Pepsi Cola" and "Pepsi Soda".

Edit: I'm talking about grammar for speech recognition. Speech recognition systems need an accompanying grammar file so they know what to recognize (gsl, grxml). And I was actually also thinking about not just any words but something like names where you can't classify into categories.

Solution 4

I don't have an answer that will solve my problems but Yuval's answer clearly suggests that this is a subject that's still under development and it is not a mature enough subject. I understand that there's probably no easy grammar fix that exists right now (at least outside the research labs). The only solution to doing a good grammar right now is probably constant learning of user inputs and agile refactoring of the grammar files.

OTHER TIPS

Now I see. You do mean grammars. The grammar formats you specify are cousins of context-free grammars. There exists a research field around automatic learning of context-free grammars. Probabilistic Context-free grammars are central to this field. See Roni rosenfeld's Notes (PostScript) on learning PCFGs, the Bayesian version (zipped postscript) and unsupervised PCFG learning (PDF). This is an active research fields, and has changed since these papers were written. Eugene Charniak is a prolific researcher in this field.

For a 50-100 thousand word lexicon, you're almost certainly better off building a dictation grammar, rather than trying to build a context-free grammar. Microsoft has their Dictation Resource Kit available for free; I haven't used it, so I can't comment on how usable it is.

I assume you mean part-of-speech tagging; the fastest approach is to use an automated tagger and manually verify (and correct) the results. Even if the tagger has a hitrate as low as 60-70% it will still significantly reduce the amount of work.

Totally random/vauge ideas off the top of my head:

-You could try to classify words into categories (noun, verb, etc.) and then form potentially correct forms for whole statements/sentences based on the classes of words. You could then try to fit new test data to a previously defined model based on the words and the order in which their used.

-I'd also be curious about using some sort of machine learning algorithm to learn proper use of words based off of some sort of training data or literature. Once you've trained your algorithm some you could try to classify new incoming data based off of previous results.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow