Question

I want to build a English acoustic model for children under 14 in China, with about 800 words in vocabulary, using cmusphinx.

I did some research that some commercial voice engine take thousands of hours of voice record to train their acoustic model: (nuance and google spent 2000+ and 1000+ hours).

For I need to achieve about 95% a accurate rate, How many hours do I need for the voice corpus ?

Is it the longer the voice corpus is, the better accurate rate it will achieve ?

Was it helpful?

Solution

300-400 hours is a good amount of data. Less than 100 will not work.

With the increase of the data size you will not necessary get an increase in accuracy if the training data itself has systematic issues, however, if you properly analyze issues in a training data, the result could potentially improve.

If you study machine learning in general, the course would cover data preparation issues.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top