Question

Is there Anyone that has experience with any open source, or relatively cheap voice recognition API for java? I'm pretty much looking for something that will turn spoken words into text.

From the java speech recognition page on sun, it seems that it is something that is rather dead. My requirements is something that at the least runs on linux.

Can anyone recommend something? Pure java would be a bonus, else a linux based solution could be considered. And since this is a home project... the cheaper the better.

  • Edit

CMU Sphinx As Amit pointed out CMU Sphinx http://cmusphinx.sourceforge.net/html/cmusphinx.php My problem is a massive word error rate. Training seems like a project all in itself, I'm hoping to gather some strength to try it this weekend.

IBM ViaVoice
There are news announcements floating around for 2004 about Via Voice being made open source. It seems the news release was premature and that it never happened. VIA Voice was released for linux at some point, but It seems they stopped. All that seems to be left on IBM's website is ViaVoice embedded.

IBM Websphere Voice
I imagine this is why ViaVoice (desktop) seems discontinued. IBM created this commercial solution which will cost allot more than an arm and a leg. And just using it will take the ones you have left, at least after my experience with websphere and their IDE.

Nuance
It seems they still might create products for linux. But I think they got lost and followed IBM into the server market. I'm not that sure about this one, their web-site is not that friendly in finding useful information.

Open Mind / Free Speech
These guys keep changing their project name. Probably some money hungry company keeps threatening them, but I dont know. The project looks a bit dead.

I might try training Sphinx this weekend to see if it wants to be friends. Else worse case, I'll be looking at using Microsoft's speech solution. It has worked well for me in the past, but it's not a great linux solution. I could probably use it through wine, but then I'll have two separate servers... messy messy.

Oh and what seems a good place to visit for voice/speech SpeechTechMag. They have a 'Anual Reference' that has a list of companies that somehow relates themselves to voice/speech.

Was it helpful?

OTHER TIPS

sphinx is by far the best option available if you are on a budget. however it also makes a huge difference what models you use, how you tune them and how you tune your audio source. absolutely everything has to match otherwise it just wont work. given the problem you described id be willing to bet a substantial sum that you've got you got your models mixed up and your mic is not correctly calibrated. also, if you have an accent it probably will not work - this is not an issue with the decoder but with the acoustic models - if no one with a voice/accent similar to yours was included in the training data you'll get poor results.

that said, have you looked at their open source models page?

http://www.speech.cs.cmu.edu/sphinx/models/

depending on what you are trying to do you should be able to obtain about 90% accuracy on free speech with the 16kHz WSJ models and the gigaword LMs NVP. i caution however that ASR is a massive undertaking and hasn't yet reached commodity status.

you can download vPass (voice password) from http://www.basic-signalprocessing.com.

For (vText) voice to text, i can send the vText.jar file to your email. Pls notify enquiry@basic-signalprocessing.com

The components are designed for Java and .Net language. The recognition period is 5 seconds. VPass is well tested vText is not, still new, that's why not packaged yet.

regards, Andreas

I have been looking for the same thing for a few days now. So far I have found Sphinx4 and FreeTTS. Both are java implementations and Sphinx seems like it is updated rather frequently unlike FreeTTS. The only problem that I am having is that Sphinx is having problems understanding me in an office environment, and I need a solution for a warehouse environment.

My group finished a mini program in Java to recognize spoken digits using Sphinx.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top