Pregunta

I current using pocketsphinx , and build my own language model offline ,instead of using lmtool online

I follow the steps in sphinx tutoriallm wiki here is my step:

#!/bin/bash
text2wfreq < 1.txt | wfreq2vocab > 1.vocab
text2idngram -vocab 1.vocab -idngram 1.idngram < 1.txt
idngram2lm -vocab_type 0 -idngram 1.idngram -vocab 1.vocab -arpa 1.arpa
sphinx_lm_convert -i 1.arpa -o 1.lm.DMP
sphinx_lm_convert -i 1.lm.DMP -ifmt dmp -o 1.lm -ofmt arpa
#pocketsphinx_continuous -lm 1.lm -dict 1.dic

1.txt:

<s> Children  I want you to draw your bedroom </s>
<s> In my room  there is a big bed next to the window and a picture on the door </s>
.... #more

however, when I run pocketsphinx_continuous -lm 1.lm -dict 1.dic , result is not correct at all, is there anything wrong with my steps.

and I notice I has the completely wrong output with my own lm file , compared with online-build lm file :

INFO: ngram_search_fwdflat.c(951): fwdflat 0.01 wall 0.006 xRT
**INFO: ngram_search.c(1214): </s> not found in last frame, using OK.150 instead**
INFO: ngram_search.c(1266): lattice start node <s>.0 end node OK.115
INFO: ngram_search.c(1294): Eliminated 1 nodes before end node
INFO: ngram_search.c(1399): Lattice has 30 nodes, 18 links
INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(OK:115:150) = -1308625
INFO: ps_lattice.c(1403): Joint P(O,S) = -1309458 P(S|O) = -833
INFO: ngram_search.c(888): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(891): bestpath 0.00 wall 0.000 xRT
000000001: TV OK

READY....

and I try use my own lm and system dic and it show totally mismatching result.

¿Fue útil?

Solución

We'd need to know more about what you did. Post a pointer to your 1.txt and 1.dic files, as well as your 1.lm file. Here's a (random) thought: Your .lm looks to be mixed case. Is your .dic uppercase?

I'm not sure what this is about:

sphinx_lm_convert -i 1.arpa -o 1.lm.DMP
sphinx_lm_convert -i 1.lm.DMP -ifmt dmp -o 1.lm -ofmt arpa

.arpa and .lm are essentially the same file; .DMP files are meant to pre-compute some stuff for the on-line representation of the language model.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top