python: lower() german umlauts

https://stackoverflow.com/questions/15052830

11-03-2022
|

Question

I have a problem with converting uppercase letters with umlauts to lowercase ones.

print("ÄÖÜAOU".lower())

The A, O and the U gets converted properly but the Ä,Ö and Ü stays uppercase. Any ideas?

First problem is fixed with the .decode('utf-8') but I still have a second one:

# -*- coding: utf-8 -*-
original_message="ÄÜ".decode('utf-8')
original_message=original_message.lower()
original_message=original_message.replace("ä", "x")
print(original_message)

Traceback (most recent call last): File "Untitled.py", line 4, in original_message=original_message.replace("ä", "x") UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Solution

You'll need to mark it as a unicode string unless you're working with plain ASCII;

> print(u"ÄÖÜAOU".lower())

äöüaou

It works the same when working with variables, it all depends on the type assigned to the variable to begin with.

> olle = "ÅÄÖABC"
> print(olle.lower())
ÅÄÖabc

> olle = u"ÅÄÖABC"
> print(olle.lower())
åäöabc

OTHER TIPS

You are dealing with encoded strings, not with unicode text.

The .lower() method of byte strings can only deal with ASCII values. Decode your string to Unicode or use a unicode literal (u''), then lowercase:

>>> print u"\xc4AOU".lower()
äaou

If you're using Python 2 but don't want to prefix u"" on all your strings put this at the beginning of your program:

from __future__ import unicode_literals
olle = "ÅÄÖABC"
print(olle.lower())

will now return:

åäöabc

The encoding specifies how to interpret the characters read in from disk into a program, but the from __ future __ import statement tells how to interpret these strings within the program itself. You will probably need both.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow