Pregunta

Can you configure the way SAPI.spVoice reads text?

In my situation I am reading the current clipboard using an AutoHotKey script. The script makes a COM call to SAPI.spVoice passing the text from the clipboard.

;;;;;;;;;;;;;;;;;;;;TTS;;;;;;;;;;;;;;;;;;;;;;
#^!D:: ; Win + Ctrl + D + Alt 
ClipSaved := ClipboardAll   
clipboard = ; Start off empty to allow ClipWait to detect when the text has arrived.
Send ^c
ClipWait  ; Wait for the clipboard to contain text.
ComObjCreate("SAPI.SpVoice").Speak(clipboard)
Clipboard := ClipSaved 
ClipSaved = ; Free the memory 
return 

The problem is.. that SAPI reads some text incorrectly..

For Example:

  • "Yes it is. Ours is complex." reads "is." as island,
  • "Yes it is. This is complex." is read correctly.

You can experiment with this by doing the following:

If you are running windows 7.

  • Press the windows key and type "Change text to speech settings" and pick the option.
  • In this dialog enter "Yes it is. Ours is complex." in the "Use the following text to preview the voice:" field.
  • Press "Preview Voice"
  • Hear it read the "is." as island.

So... My question is...

Is it possible to change/configure the way "Microsoft Anna" reads text so it doesn't make these mistakes?

Is this a bug in the Anna voice only or all voices?

How can I make it read the text the way I want it read?

¿Fue útil?

Solución 2

"Every problem (except the problem of too many levels of indirection) can be solved with another level of indirection."

The SAPI.spVoice object can be passed text (as I was doing) or SSML.

By taking the text to be spoken, then converting it to SSML you gain control over how words are spoken. You have a chance to pre-process the text and replace miss-read words with the specific pronunciation you wish.

For example: "Yes it is. Ours is complex." becomes "Yes it <sub alias="is">is</sub>. Ours is complex."

sub and say_as seem to work. phoneme seem to be ignored.. but I may have something configured wrongly.

Note: If you want XML read aloud, XML escape the text before converting it to SSML, otherwise it will be assumed to be part of the SSML.

So.. in code:

;;;;;;;;;;;;;;;;;;;;TTS;;;;;;;;;;;;;;;;;;;;;;
#^D:: ; Win + Ctrl + D 
ClipSaved := ClipboardAll   
Clipboard = ; Start off empty to allow ClipWait to detect when the text has arrived.
Send ^c
ClipWait  ; Wait for the clipboard to contain text.
FileDelete , c:\tmp\tmp_ahk_tts_clip.txt
FileAppend , %Clipboard% , c:\tmp\tmp_ahk_tts_clip.txt
RunWait, %comspec% /c ""F:\bin\tools\speakit.rb" c:\tmp\tmp_ahk_tts_clip.txt > c:\tmp\tmp_ahk_clip_tts_out.txt" ,,Hide
FileRead, Clipboard, c:\tmp\tmp_ahk_clip_tts_out.txt
ComObjCreate("SAPI.SpVoice").Speak(Clipboard)
Clipboard := ClipSaved 
ClipSaved = ; Free the memory 
return 

and F:\bin\tools\speakit.rb is sometihng like this:

#!/usr/bin/env ruby
substitutions = [
[/[A-Z][A-Z][A-Z][A-Z]+((?=[^A-Za-z])|(?!.))/, lambda{|x|x.downcase}], #All caps becomes word
[/\.exe(?=[^a-z])/i, " executable "],
[/\.txt(?=[^a-z])/i, " text file "],
[/rebranded/, "re-branded"],
[/App(?=[\s\.])/, " application "],
['GUI' , " gooee "],
[/localhost/, "local host"],
[/(?<word>[A-Z][a-z]*)(?=[A-Z ,\.;:\t\/])/, "'\\k<word>' "], # CamelCaseWords should be split by spaces
['\\', '<sub alias="slash">\\</sub>'],
]


require 'cgi'

puts <<-eos
<?xml version="1.0"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-UK">
<voice xml:lang="en-UK">
   #{substitutions.reduce(CGI::escapeHTML(ARGF.read)){|o, (r,s)| s.is_a?(Proc) ? o.gsub(r, &s) : o.gsub(r,s) }}
</voice>
</speak>
eos

Otros consejos

This is done by SAPI's text normalization code. Unfortunately, this is quite difficult to modify without building a custom voice, which is likely far more work than you probably want to implement.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top