Question

I uploaded a picture in my S3 bucket, the filename is Müller.jpg.

When I dig into the file properties in the web UI, it gives me the following link: https://s3-eu-west-1.amazonaws.com/my_bucket_name/Mu%CC%88ller.jpg

How can I achieve the same encoding in python 2.x?

>>> import urllib
>>> name = u"Müller.jpg"
>>> urllib.quote(name.encode('utf-8'))
'M%C3%BCller.jpg'
Was it helpful?

Solution

It seems like the filename is normalized to NFD or NFKD.

Use unicodedata.normalize:

>>> import unicodedata
>>> import urllib
>>> name = u"Müller.jpg"
>>> urllib.quote(unicodedata.normalize('NFD', name).encode('utf-8'))
'Mu%CC%88ller.jpg'
>>> urllib.quote(unicodedata.normalize('NFKD', name).encode('utf-8'))
'Mu%CC%88ller.jpg'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top