sqlalchemy UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 when trying to select all from msssql 2005 table

StackOverflow https://stackoverflow.com/questions/8637894

Question

I'm building an application that works with the db of another application, using mssql 2005 (I can't change that or change the existing table definition). w the mssql table collate is: "hebrew bin", and the application shows perfect hebrew from the table, all the py files are encoded utf-8

notice! there is no problem writing with unicode hebrew strings to the db with mssql. there is no trouble choosing and deleting: DBSession2.query(object).filter(object.LOADED=='Y').delete() but when selecting from the table I get this very annoying error:

  File "D:\Python27\learn\agent\agent\lib\encodings\utf_8.py", line 16, in decode     return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: invalid continuation byte

where the exact byte code changes according to the first byte of the first row in the table.

yes, I know that this translates to an hebrew letter - this shouldn't be a problem since all sides play with unicode - at least that is what I thought.

btw - this worked fine on the test mssql 2005 server but doesn't work with the production server.

some code: this is the failing part of the function:

def iterateJson(parser,injson,object):
    '''iterateJson(parser,injson,object):getting a parser method an a json and iterating over the json
    with the parser method, checkes for existing objects in the db table and deletes them before commiting the new one to
    prevent integerityerrors
    writes ValidateJsonError to errorlog for each element in the json
    getting an onject name to check for loaded etc'''

    #first lets erase the table from loaded objects
    DBSession2.query(object).filter(object.LOADED=='Y').delete()
    print "finished deleting loaded"
    #now lets get a list from the table of loaded id
    raw_list = DBSession2.query(object).all() #the failing part!
    print "getting raw list of unloaded" #doesn't get here!
    if object == Activities:
        id_list = [e.EVENTCODE for e in raw_list]
        id = e.EVENTCODE

this is part of the sqlalchemy class:

class Deposit(DeclarativeBase2):
    __tablename__ = 'NOAR_LOADDEPOSIT'
    #LINE = Column(INT(8)) 
    RECDEBNUM = Column(NVARCHAR(9) , primary_key=True)
    CURDATE = Column(BIGINT, nullable=False, default=text(u'((0))')) 
    PAYTYPE = Column(CHAR(1), nullable=False, default=text(u"('')")) 
    BANKCODE = Column(NVARCHAR(8), nullable=False, default=text(u"('')")) 
    CUSTACCNAME = Column(NVARCHAR(16), nullable=False, default=text(u"('')")) 
    PAGENUM = Column(NVARCHAR(5), nullable=False, default=text(u"('')"))
    RECNUM = Column(NVARCHAR(2), nullable=False, default=text(u"('')")) 
    RECDATE = Column(BIGINT, nullable=False, default=text(u'((0))')) 
    FIXNUM = Column(NCHAR(1), nullable=False, default=text(u"('')")) 
    EVENTNUM = Column(NVARCHAR(5), nullable=False, default=text(u"('')")) 
    GROUPCODE = Column(NVARCHAR(7), nullable=False, default=text(u"('')")) 
    IDNUMBER = Column(NVARCHAR(9), nullable=False, default=text(u"('')")) 

and the other class (both give the same problem)

class Activities(DeclarativeBase2):  


    __tablename__ = 'NOAR_LOADEVENTS'

    EVENTCODE = Column(NVARCHAR(8), primary_key=True)
    EVENTDES = Column(Unicode, nullable=False, default=text(u"('')"))
    TYPE = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LC = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LD = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LE = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LF = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LG = Column(NCHAR(1), nullable=False, default=text(u"('')"))
    LH = Column(NCHAR(1), nullable=False, default=text(u"('')"))

using: python 2.7 (64bit win) with pyodbc 2.1.11 vs mssql server 2005, sqlalchemy 0.7.3 tg2.1.3

be glad for any help or reference

Was it helpful?

Solution 2

found the answer - a classic wtf. seems the the production server had a slightly different configuration then the test server (although both were supposed to be the same) - so instead of nvarchar my application model was expecting it got varchar. thanks for the help. writing the question helped me clear my thoughts anyway

OTHER TIPS

Based on what you say, it is highly likely that text columns in the database table are encoded in cp1255 (Windows Hebrew) but your code is expecting UTF-8, perhaps in default of being explicitly told anything about the encoding.

You need to find what call into sqlalchemy or pyodbc needs to modified to correct the misinformation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top