Question

I'm trying to make a program that:

  • reads a list of Chinese characters from a file, makes a dictionary from them (associating a sign with its meaning).
  • picks a random character and sends it to the browser using the BaseHTTPServer module when it gets a GET request.

Once I managed to read and store the signs properly (I tried writing them into another file to check that I got them right and it worked) I couldn't figure out how to send them to my browser.

I connect to 127.0.0.1:4321 and the best I've managed is to get a (supposedly) url-encoded Chinese character, with its translation.

Code:

# -*- coding: utf-8 -*-
import codecs
from BaseHTTPServer import HTTPServer, BaseHTTPRequestHandler
from SocketServer import ThreadingMixIn
import threading
import random
import urllib

source = codecs.open('./signs_db.txt', 'rb', encoding='utf-16')

# Checking utf-16 works fine with chinese characters and stuff :
#out = codecs.open('./test.txt', 'wb', encoding='utf-16')
#for line in source:
#   out.write(line)

db = {}
next(source)
for line in source:
    if not line.isspace():
            tmp = line.split('\t')
            db[tmp[0]] = tmp[1].strip()

class Handler(BaseHTTPRequestHandler):

    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        message =  threading.currentThread().getName()
        rKey = random.choice(db.keys())
        self.wfile.write(urllib.quote(rKey.encode("utf-8")) + ' : ' + db[rKey])
        self.wfile.write('\n')
        return

class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
    """Handle requests in a separate thread."""

if __name__ == '__main__':
    server = ThreadedHTTPServer(('localhost', 4321), Handler)
    print 'Starting server, use <Ctrl-C> to stop'
    server.serve_forever()

If I don't urlencode the chinese character, I get an error from python :

self.wfile.write(rKey + ' : ' + db[rKey])

Which gives me this:

UnicodeEncodeError : 'ascii' codec can't encode character u'\u4e09' in position 0 : ordinal not in range(128)

I've also tried encoding/decoding with 'utf-16', and I still get that kind of error messages.

Here is my test file:

Sign    Translation

一   One
二   Two
三   Three
四   Four
五   Five
六   Six
七   Seven
八   Eight
九   Nine
十   Ten

So, my question is: "How can I get the Chinese characters coming from my script to display properly in my browser"?

Was it helpful?

Solution

Declare the encoding of your page by writing a meta tag and make sure to encode the entire Unicode string in UTF-8:

self.wfile.write(u'''\
    <html>
    <headers>
    <meta http-equiv="content-type" content="text/html;charset=UTF-8">
    </headers>
    <body>
    {} : {}
    </body>
    </html>'''.format(rKey,db[rKey]).encode('utf8'))

And/or declare the HTTP content type:

self.send_response(200)
self.send_header('Content-Type','text/html; charset=utf-8')
self.end_headers()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top