PyMySQL UnicodeEncodeError; python shell successes but cmd fails

https://stackoverflow.com/questions/12814989

06-07-2021
|

Question

I'm new to pymysql module and trying to discover it, I have a simple code:

import pymysql

conn=pymysql.connect(host="127.0.0.1",
                         port=8080,user="root",
                         passwd="mysql",
                         db="world",
                         charset="utf8",
                         use_unicode=True)
cur=conn.cursor()
cur.execute("SELECT * FROM world.city")

for line in cur:
    print(line)

cur.close()
conn.close()

I'm using Python Tools for Visual Studio. When i execute the code, it fails with this error:

Traceback (most recent call last):
  File "C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\Extensio
ns\Microsoft\Python Tools for Visual Studio\1.5\visualstudio_py_debugger.py", li
ne 1788, in write
    self.old_out.write(value)
  File "C:\Python32\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: cha
racter maps to <undefined>

Failing line contains city name : Â´s-Hertogenbosch

I thought that maybe it's a related problem with cmd output so I've switched to python shell, and my script runs without any error.

So what is the problem I'm facing? How can I solve it?

I really want to use Python Tools for Visual Studio, so answers that enable me to use PTVS will be most welcomed.

Solution

The problem probably is that the output encoding of the environment is set to cp437 and the unicode character cannot be converted to that encoding when doing print(line) that probably translates to the self.old_out.write(value).

Try to replace the print() inside the loop by writing to the file, like:

with open('myoutput.txt', 'w', encoding='utf-8') as f:
    for line in cur:
        f.write(line)

Well, but the cursor does not return a string line. It return a row (I guess tuple) of elements. Because of that you probably have to do something like that:

with open('myoutput.txt', 'w', encoding='utf-8') as f:
    for row in cur:
        f.write(repr(row))

This may be enough for a diagnostic purpose. If you need some nicer string, you have to format it in some specific way.

Also, you wrote:

                     charset="utf8",
                     use_unicode=True)

If the charset is used, then use_unicode=True can be left out (it is implied by using the charset. If I recall correctly, the charset='utf8' is not any recognized encoding for Python. You have to use charset='utf-8' -- i.e. with dash or underscore between utf and 8. Correction: The utf8 probably works as it is one of the aliases.

UPDATE based on comments...

As the output to a file seems to be OK, the problem is related to the capabilities of the window used for the output of the print command. As the cmd knows only cp437, you have to use or another window (like a Unicode capable window of some GUI), or you have to tell the cmd to use another encoding. See the experience of others. Basically, you have to tell the console:

chcp 65001

to change accepted output encoding to UTF-8, or you can use another (non-Unicode) encoding that supports the wanted characters. Also, the console font should be capable to display the characters (i.e. to contain the glyphs, the images of the characters).

OTHER TIPS

My guess is the data you're receiving is not in unicode despite the fact that your python script is trying to encode it in Unicode.

I would check for database and table spesific charset & collation settings. utf8 & utf8_general_ci are your friends.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow