Dealing with string case and diacritics in python

https://stackoverflow.com/questions/17551812

python
string
diacritics

02-06-2022
|

题

I'm doing some text processing, and I need all the text to be converted to lowercase, BUT the text is french and I need to maintain all the diacritics, so that "È" would get converted to "è", etc. If it helps at all, I actually don't need the final output as text, just a identifier (ex. a number) for each unique character (where "e" and "è" are different characters). Any suggestions?

解决方案

Use Unicode strings:

>>> u"É".lower()
'é'

其他提示

I think your problem is that you are converting to ascii. If you try something like

word = u"HÈLLO"
print word.lower()

That should do it

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow