encoding text to html entity (not the tags)

https://stackoverflow.com/questions/14825552

09-03-2022
|

题

I've been searching a lot for this without any luck. So I thought maybe the problem is because I'm missing some concepts or don't understand what I really need, so here is the problem:

I'm using pisa to create a pdf and this is the code I use for it:

def write_to_pdf(template_data, context_dict, filename):
    template = Template(template_data)
    context = Context(context_dict)
    html = template.render(context)
    result = StringIO.StringIO()
    pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), result, link_callback=fetch_resources)

    if not pdf.err:
        response = http.HttpResponse(mimetype='application/pdf')
        response['Content-Disposition'] = 'attachment; filename=%s.pdf' % filename
        response.write(result.getvalue())
        return response

    return http.HttpResponse('Problem creating PDF: %s' % cgi.escape(html))

So if I try to make this string become a pdf:

template_data = 'tésting á'

It turns into something like this(consider # being a black spot instead of letter):

t##sting á

I tried to use cgi.escape without any luck because the black spot would be still there and it ends up printing html tags. It's python 2.7 so I can't use html.escape and solve all my problems.

So I need something that can convert the normal text into html entities without affecting the html tags already there. Any clues?

Oh and if I change that line:

pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), result, link_callback=fetch_resources)

pdf = pisa.pisaDocument(html, result, link_callback=fetch_resources)

it works, but it doesn't create the html entities, which I need because I don't know exactly what kind of character will be placed there and might not get supported by pisa.

解决方案

Encode named HTML entities with Python

http://beckism.com/2009/03/named_entities_python/

There is also a django app for both decoding and encoding:

https://github.com/cobrateam/python-htmlentities

For Python 2.x (Change to html.entities.codepoint2name in Python 3.x):

'''
Registers a special handler for named HTML entities

Usage:
import named_entities
text = u'Some string with Unicode characters'
text = text.encode('ascii', 'named_entities')
'''

import codecs
from htmlentitydefs import codepoint2name

def named_entities(text):
    if isinstance(text, (UnicodeEncodeError, UnicodeTranslateError)):
        s = []
        for c in text.object[text.start:text.end]:
            if ord(c) in codepoint2name:
                s.append(u'&%s;' % codepoint2name[ord(c)])
            else:
                s.append(u'&#%s;' % ord(c))
        return ''.join(s), text.end
    else:
        raise TypeError("Can't handle %s" % text.__name__)

codecs.register_error('named_entities', named_entities)

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow