UnicodeEncodeError: 'ascii' codec can't encode characters in position 4273-4279: ordinal not in range(128)

https://stackoverflow.com/questions/23469008

15-07-2023
|

题

I want to convert an html page to pdf. For that, I access data from excel and store it in a python dictionary. After that I format the string like below.

Write python variables data into file:

 html_file.write( html_rcc_string%(row["B_6.2OwnerName"],
                           row["B_6.3OwnerNameH"],))

In the above code html_rcc_string contains html code, i.e.

<table>
    <tr>
        <td>Owner name</td>
        <td>Owner name in hindi</td>
    </tr>
    <tr>
         <td>%s</td>
         <td>%s</td>
    </tr>
</table>

When I supply a dictionary variable which has a name in Hindi, it returns the below error.

UnicodeEncodeError: 'ascii' codec can't encode characters in position 4273-4279: ordinal not in range(128)

I googled for this but I did not find anything. How can I display user name in Hindi? Any suggestions?

解决方案

Consider this advice from the excellent Pragmatic Unicode -or- How Do I Stop the Pain?: make a "Unicode sandwich - bytes on the outside, unicode on the inside". That is, convert all input to Unicode the instant you read it, and convert all output to utf8 the instant you write it.

Applying that logic to your program, I have this:

# coding: utf8
row = {
  "B_6.2OwnerName": u'ABHAY',
  "B_6.3OwnerNameH": u'अभय' }

html_rcc_string = u'''
<table>
    <tr>
        <td>Owner name</td>
        <td>Owner name in hindi</td>
    </tr>
    <tr>
         <td>%s</td>
         <td>%s</td>
    </tr>
</table>
'''

with open('/tmp/html_file.html', 'w') as html_file:
    html_file.write( (html_rcc_string%(row["B_6.2OwnerName"],
                                      row["B_6.3OwnerNameH"],)).encode('utf8') )

There are other ways to invoke the utf8 encoder, but the point remains: ensure that all of your in-program data is unicode, not str. At the final moment, and only then, do you convert to utf8-encoded str.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow