Domanda

I have problem with displaying national characters from “ENGLISH_UNITED KINGDOM.US7ASCII” Oracle 11 database using Python 3.3 cx_Oracle 5.1.2 and "NLS_LANG" environment variable. Db table column type is "VARCHAR2(2000 BYTE)"

How to display string "£aÀÁÂÃÄÅÆÇÈ" from Oracle US7ASCII in Python? This will be some sort of hack. The hank works in every other scripting language Perl, PHP, PL/SQL and in Python 2.7, but it does not work in Python 3.3.

In Oracle 11 Database I created SECURITY_HINTS.ANSWER="£aÀÁÂÃÄÅÆÇÈ". ANSWER column type is "VARCHAR2(2000 BYTE)".

Now when using cx_Oracle and default NLS_LANG, I get "¿a¿¿¿¿¿¿¿¿¿"

and when using NLS_LANG="ENGLISH_UNITED KINGDOM.US7ASCII" I get

"UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0: ordinal not in range(128)"

Update1 I made some progress. When switching to Python 2.7 and cx_Oracle 5.1.2 for Python 2.7 the problem goes away (I get all >127 characters from db). In Python 2 strings are represented as bytes and in Python 3+ strings are represented as unicode. I still need best possible solution for Python 3.3.

Update2 One possible solution to the problem is to used rawtohex(utl_raw.cast_to_raw see code below.

cursor.execute("select rawtohex(utl_raw.cast_to_raw(ANSWER)) from security_hints where userid = '...'")
for rawValue in cursor:
    print (''.join(['%c' % iterating_var for iterating_var in binascii.unhexlify(rawValue[0])]))

source code of my script is below or at GitHub and GitHub Sollution

def test_nls(nls_lang=None):
    print (">>> run test_nls for %s" %(nls_lang))
    if nls_lang:
        os.environ["NLS_LANG"] = nls_lang
    os.environ["ORA_NCHAR_LITERAL_REPLACE"] = "TRUE"

    connection = get_connection()
    cursor = connection.cursor()
    print("version=%s\nencoding=%s\tnencoding=%s\tmaxBytesPerCharacter=%s" %(connection.version, connection.encoding,
            connection.nencoding, connection.maxBytesPerCharacter))

    cursor.execute("SELECT USERENV ('language') FROM DUAL")
    for result in cursor:
        print("%s" %(result))

    cursor.execute("select ANSWER from SECURITY_HINTS where USERID = '...'")
    for rawValue in cursor:
        print("query returned [%s]" % (rawValue))
        answer = rawValue[0]
    str = ""
    for iterating_var in answer:
        str = ("%s [%d]" % (str, ord(iterating_var)))

    print ("str %s" %(str))

    cursor.close()
    connection.close()

if __name__ == '__main__':
    test_nls()
    test_nls(".AL32UTF8")
    test_nls("ENGLISH_UNITED KINGDOM.US7ASCII")

see log output below.

run test_nls for None
version=11.1.0.7.0
encoding=WINDOWS-1252   nencoding=WINDOWS-1252  maxBytesPerCharacter=1
ENGLISH_UNITED KINGDOM.US7ASCII
query returned [¿a¿¿¿¿¿¿¿¿¿]
str  [191] [97] [191] [191] [191] [191] [191] [191] [191] [191] [191


run test_nls for .AL32UTF8
version=11.1.0.7.0
encoding=UTF-8  nencoding=UTF-8 maxBytesPerCharacter=4
AMERICAN_AMERICA.US7ASCII
query returned [�a���������]
str  [65533] [97] [65533] [65533] [65533] [65533] [65533] [65533] [65533] [65533] [65533]

run test_nls for ENGLISH_UNITED KINGDOM.US7ASCII
version=11.1.0.7.0
encoding=US-ASCII   nencoding=US-ASCII  maxBytesPerCharacter=1
ENGLISH_UNITED KINGDOM.US7ASCII
Traceback (most recent call last):
  File "C:/dev/tmp/Python_US7ASCII_cx_Oracle/showUS7ASCII.py", line 71, in <module>
    test_nls("ENGLISH_UNITED KINGDOM.US7ASCII")
  File "C:/dev/tmp/Python_US7ASCII_cx_Oracle/showUS7ASCII.py", line 55, in test_nls
    for rawValue in cursor:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 0: ordinal not in range(128)

I am trying to Display it in Django Web page. But each character comes as character with code 191 or 65533.

I looked at choosing NLS_LANG for Oracle and Importing from Oracle using the correct encoding with Python

Cannot Insert Unicode Using cx-Oracle

È stato utile?

Soluzione

If you want to get unchanged ASCII string in client application, the best way is transfer it from DB in binary mode. So, first conversion must be down on server side with help of UTL_RAW package and standard rawtohex function.

Your select in cursor.execute may look like that:

select rawtohex(utl_raw.cast_to_raw(ANSWER)) from SECURITY_HINTS where USERID = '...'

On the client you got a string of hexadecimal characters which may be converted to a string representation with help of binascii.unhexlify function:

for rawValue in cursor:
       print("query returned [%s]" % (binascii.unhexlify(rawValue)))

P.S. I didn't know a Python language, so last statement may be incorrect.

Altri suggerimenti

I think you should not revert to such evil trickery. NLS_LANG should simply be set to the client's default encoding. Look at more solid options:

  1. Extend the character set of the database to allow these characters in a VARCHAR column.
  2. Upgrade this particular column to NVARCHAR. You could perhaps use a new name for this column and create a VARCHAR computed column with the old name for the legacy applications to read.
  3. Keep the database as is but check the data when it gets entered and replace all non-ASCII characters with an acceptable ASCII equivalent.

Which option is best depends on how common the non-ASCII characters are. If there's more tables with the same issue, I'd suggest option 1. If this is the only table, option 2. If there are only a couple non-ASCII characters in the entire table, and their loss is not that big a deal: option 3.

One of the tasks of a database is to preserve the quality of your data after all, and if you cheat when forcibly inserting illegal characters into the column, it cannot do its job properly and each new client or upgrade or export will come with interesting new undefined behavior.


EDIT: See Oracle's comment on an example of a similar setup in the NLS_LANG faq (my emphasis):

A database is created on a UNIX system with the US7ASCII character set. A Windows client connecting to the database works with the WE8MSWIN1252 character set (regional settings -> Western Europe /ACP 1252) and the DBA, use the UNIX shell (ROMAN8) to work on the database. The NLS_LANG is set to american_america.US7ASCII on the clients and the server.

Note:

This is an INCORRECT setup to explain character set conversion, don't use it in your environment!

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top