Pregunta

I have a utf8 string with non-ASCII chars. I need to put it to html file in the ampersand-hash-digits-semicolon form. What is the best way to do this?

¿Fue útil?

Solución

Use the .encode method with 'xmlcharrefreplace' passed as errors parameter:

In [1]: help(unicode.encode)
Help on method_descriptor:

encode(...)
    S.encode([encoding[,errors]]) -> string or unicode

    Encodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
    'xmlcharrefreplace' as well as any other name registered with
    codecs.register_error that can handle UnicodeEncodeErrors.

In [2]: ustr = u'\xa9 \u20ac'

In [3]: print ustr
© €

In [4]: print ustr.encode('ascii', 'xmlcharrefreplace')
© €
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top