You haven't specified a platform, i.e Windows/Linux etc, though it may not really matter.
My initial thought is to combine the PICO tts library with the Blender Game Engine (BGE). Though I'm not sure if there are python bindings for the Pico engine.
The espeak project (espeak.sourceforge.net/) converts normal text into phonemes, which could then be used to drive shape-keys (or blend-keys, I forget Blender's name for them, it's been 5+ years since I last played with Blender/Maya/3DSMax).
Presumably, you could implement the espeak engine in python (or create a module that was accessible via python) and use that to generate the required phonemes before passing them off to your shape/blend-key controller and to Pico simultaneously.
After a quick look, it seems that libttspico-dev is the package that supports development of pico enabled apps, though it only appears to contain c/c++ files - I suppose it should be possible to create a python module that leveraged the engine, but I'm really not familiar with anything more about Pico than it's name and basic function. This may be a foolish and uninformed suggestion.
In any case, that sure is an interesting project. Perhaps the easier route would be to create an app in C/C++ that used OGRE and Pico. Important factor would be OGRE's ability to blend from 1 shape-key to the next, also - it may also be that Pico does everything internally in such a way that you can't get callbacks or monitor it's current position in the playing speech.
Bookmarked.