質問

I'm trying to use the unidecode library in Python3 to remove accents in Russian words (in Cyrillic alphabet). The unidecode lib works fine for other examples but not Russian words. Any help would be greatly appreciated.

Instead of removing the accent on the "e" letter, the Russian word becomes "ND3/4D3/4D+-NDuID1/2D,N", which is not what we want ...

Python 3.3.0 (default, Oct 24 2012, 14:30:03)
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> # -*- coding: utf-8 -*-
...
>>> from unidecode import unidecode
>>> print(unidecode(u"Cœur"))
CAur
>>> print(unidecode(u"сообще́ния"))
ND3/4D3/4D+-NDuID1/2D,N
>>>
役に立ちましたか?

解決

I tried on Mac OSX.

$ echo $LANG
en_US.utf-8
$ python3
Python 3.3.2 (default, Aug 22 2013, 12:33:42)
[GCC 4.2.1 Compatible Apple Clang 4.0 ((tags/Apple/clang-421.0.60))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from unidecode import unidecode
>>> print(unidecode(u"Cœur"))
Coeur
>>> print(unidecode(u"сообще́ния"))
soobshcheniia

You may try setting the LANG variable.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top