How about this?
def is_latin_9(c):
try:
c.encode('iso-8859-15')
return True
except UnicodeEncodeError:
return False
def replace_non_latin_9(s):
return ''.join(c if is_latin_9(c) else u'\ufffd' for c in s)
Question
I have to accept user input in utf-8 and feed it to a system that only accepts ISO-8859-15. I'd like to convert all non-ISO-8859-15 characters in a user-supplied unicode string to U+FFFD so I could present the problematic characters to the user. What's the easiest* way to accomplish this?
I'm using Python 2.7.
*) With an arbitrary definition of "the easiest" :)
Solution
How about this?
def is_latin_9(c):
try:
c.encode('iso-8859-15')
return True
except UnicodeEncodeError:
return False
def replace_non_latin_9(s):
return ''.join(c if is_latin_9(c) else u'\ufffd' for c in s)