Question

Take the following python code, which produces a text file containing a unicode string:

def writefile():
    out = u'x \u2208 \u22C3A \u2192 \u2203y(x \u2208 y \u2208 A)'
    fout = open("output.txt",'w')
    fout.write(out.encode('UTF-8'))
    fout.close()

# The string is: x ∈ ⋃A → ∃y(x ∈ y ∈ A)

If I open this file using a light-weight text editor, such as nano (in Terminal), TextWrangler, or TextEdit, everything looks fine, but if I try to open the file through MS Word (Word for Mac 2011, v14.3.9), all the unicode characters come out garbled. When opening the file, Word throws up a dialogue asking to "Convert file from:", but every available conversion method seems to produce garbled characters. For example:

x ∈ ⋃A → ∃y(x ∈ y ∈ A) (opening as either UTF-8 or Mac OS (Default))
x ∈ ⋃A → ∃y(x ∈ y ∈ A) (opening as MS-DOS Text)

However, if I open the file in e.g. TextWrangler, then copy the string to the clipboard, and then paste that into MS Word, it displays the string properly. So two questions:

1.) What explains this behavior? I.e. the fact that Word doesn't properly display the file, and the difference between opening the file in Word versus pasting its contents into Word from another program.

2.) How would I write a script (in e.g. python) that takes the above utf-8 file and converts it into something that word can read/properly display?

Was it helpful?

Solution

  1. This is a problem with Microsoft Word. The file written is correct UTF-8, but Word is reading it in as if it was using Mac Roman encoding. Note that the Preview doesn't change when you tell Word to import using "Unicode 6.1 UTF-8".

  2. Try writing it out as UTF-16. I've checked that MS Word will correctly read in a big-endian UTF-16 file. I think just changing the UTF-8 to UTF-16 should work, although for bets results you might want to experiment with UTF-15BE and UTF-16LE, and also with writing out a BOM (Byte Order Marker) at the beginning of the file.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top