Question

I want to convert special characters which I see during web-page reading to the ASCII format. I've tried a lot, but I can't figure it out. I will give some examples below which are stored in a string in Python.I don't know what the current encoding of the web-page is, but I want to convert it to ASCII format.

Apaydın Ünal > want this to Apaydin Unal
Íñigo Martínez > want this to Inigo Martinez
Üstünel > want this to Ustunel

Who can help me?

EDIT: Thanks, I forgot. I'm using Python 2.7

Was it helpful?

Solution

Give https://pypi.python.org/pypi/Unidecode a try:

>>> from unidecode import unidecode
>>> unidecode(u'ko\u017eu\u0161\u010dek')
'kozuscek'

And to detect the encoding, see the question Determine the encoding of text in Python

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top