Question

I have to accept user input in utf-8 and feed it to a system that only accepts ISO-8859-15. I'd like to convert all non-ISO-8859-15 characters in a user-supplied unicode string to U+FFFD so I could present the problematic characters to the user. What's the easiest* way to accomplish this?

I'm using Python 2.7.

*) With an arbitrary definition of "the easiest" :)

Was it helpful?

Solution

How about this?

def is_latin_9(c):
    try:
        c.encode('iso-8859-15')
        return True
    except UnicodeEncodeError:
        return False

def replace_non_latin_9(s):
    return ''.join(c if is_latin_9(c) else u'\ufffd' for c in s)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top