Вопрос

I am trying very basic character set conversion like iconv does but not able to figure out why its not working. I am using python decode, encode routines but looks like missing on something very basic.

Code:

#!/usr/bin/python

import sys

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print ("wrong input")
        sys.exit(1)

    fi = open(sys.argv[1], "r")
    buf = fi.read()
    fi.close()

    print ("got input: \n{0}".format(buf))

    buf.decode("big5", "strict").encode("utf8", "strict")

    fo = open(sys.argv[2], "w")
    fo.write(buf)
    fo.close()


    print ("changed: \n{0}".format(buf))

Input files. hello.big5 is obtained by converting utf file with iconv

[workspace] > cat hello.utf8 
hello = 你好

[workspace] > cat hello.big5 
hello = �A�n

When executed:

[workspace] > ./test.py  hello.big5 out
got input: 
hello = �A�n

changed: 
hello = �A�n

Can someone point out where I am tripping ?

Это было полезно?

Решение

This line is not modiying buf as you appear to be thinking:

buf.decode("big5", "strict").encode("utf8", "strict")

You can see in the docs for encode and decode. Those methods return strings or unicode objects, they don't modify the calling object. If you want to modify buf just assign it the result:

buf = buf.decode("big5", "strict").encode("utf8", "strict")

Also if you're on Python2 it doesn't make sense to use parenthesis with print, can be confusing.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top