I need to check the spelling of Russian words from a Python script. I am piping those words to hunspell via shell. My hunspell dictionaries are all UTF8. I have no problems using them from the command line.
But something funky is happening when I try to send the strings from my Python script.
If I use the German dictionary:
text = "Universitüt"
cmd = "echo " +text + " | /usr/local/bin/hunspell -d German_de_DE"
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, executable="/bin/bash")
result, err = p.communicate()
if result:
result = result.split()
print(result)
I get the response that I am expecting
[b'Hunspell', b'1.3.2', b'&', b'Universit', b'4', b'0:', b'Universit\xc3\xa4r,', b'Universit\xc3\xa4t,', b'Universen,', b'Universaler', b'*']
and I can deal with that. But if I send a Russian word to the Russian dictionary with the same code except, of course:
text = "университат"
cmd = "echo " +text + " | /usr/local/bin/hunspell -d Russian_ru_RU"
The response from hunspell is empty:
[b'Hunspell', b'1.3.2']
Directly from bash it works:
echo университат | hunspell -d Russian_ru_RU
Hunspell 1.3.2
& университат 1 0: университет
So I suppose it's some kind of encoding issue. But I am at a loss as to what it could be considering that my locale is utf-8 and python's sys.getdefaultencoding()
also says utf-8.
I am using python 3.3.2 on Mac OS X.
Any tips would be greatly appreciated.