Question

I'm trying to use the unidecode library in Python3 to remove accents in Russian words (in Cyrillic alphabet). The unidecode lib works fine for other examples but not Russian words. Any help would be greatly appreciated.

Instead of removing the accent on the "e" letter, the Russian word becomes "ND3/4D3/4D+-NDuID1/2D,N", which is not what we want ...

Python 3.3.0 (default, Oct 24 2012, 14:30:03)
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> # -*- coding: utf-8 -*-
...
>>> from unidecode import unidecode
>>> print(unidecode(u"Cœur"))
CAur
>>> print(unidecode(u"сообще́ния"))
ND3/4D3/4D+-NDuID1/2D,N
>>>
Was it helpful?

Solution

I tried on Mac OSX.

$ echo $LANG
en_US.utf-8
$ python3
Python 3.3.2 (default, Aug 22 2013, 12:33:42)
[GCC 4.2.1 Compatible Apple Clang 4.0 ((tags/Apple/clang-421.0.60))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from unidecode import unidecode
>>> print(unidecode(u"Cœur"))
Coeur
>>> print(unidecode(u"сообще́ния"))
soobshcheniia

You may try setting the LANG variable.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top