Domanda

While asking question one should breakdown the whole thing into smaller questions and solve them one by one. But i am asking it the bad way,in order to completely explain my need and technical limitations hoping someone suggests the perfect set of technologies to work on.

I am to design something that will accept text as input convert into speech This speech is mouthed by a 3D model in realtime.

Here you can see all these things should be in realtime only hence I am thinking of doing it in some gaming engine, but i am not sure if what I am to do here is possible.

I need guidance, a path, on how I should make start.

È stato utile?

Soluzione

You haven't specified a platform, i.e Windows/Linux etc, though it may not really matter.

My initial thought is to combine the PICO tts library with the Blender Game Engine (BGE). Though I'm not sure if there are python bindings for the Pico engine.

The espeak project (espeak.sourceforge.net/) converts normal text into phonemes, which could then be used to drive shape-keys (or blend-keys, I forget Blender's name for them, it's been 5+ years since I last played with Blender/Maya/3DSMax).

Presumably, you could implement the espeak engine in python (or create a module that was accessible via python) and use that to generate the required phonemes before passing them off to your shape/blend-key controller and to Pico simultaneously.

After a quick look, it seems that libttspico-dev is the package that supports development of pico enabled apps, though it only appears to contain c/c++ files - I suppose it should be possible to create a python module that leveraged the engine, but I'm really not familiar with anything more about Pico than it's name and basic function. This may be a foolish and uninformed suggestion.

In any case, that sure is an interesting project. Perhaps the easier route would be to create an app in C/C++ that used OGRE and Pico. Important factor would be OGRE's ability to blend from 1 shape-key to the next, also - it may also be that Pico does everything internally in such a way that you can't get callbacks or monitor it's current position in the playing speech.

Bookmarked.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top