Question

I have data which is unicode and wish to write it to a file. I am using python 2.6. I am able to print the encoded values but am not able to write it to a file. The default encoding for the environment is UTF-8. Tried using codecs as well, but no luck there too. Here is a sample code snippet that I am using.

#!/usr/bin/python
import sys
import codecs
import csv

sh = [u'T\xe9l\xe9vista S.A.', u'T\xe9l\xe9vista S.A.', 'Python']
print sys.stdout.encoding
f = codecs.open('listwrite.txt', 'w', encoding='latin-1')
for item in sh:
  f.write(item)
f.close()

for i in sh:
  print i.encode('latin-1')
#

Output:

UTF-8
Télévista S.A.
Télévista S.A.
Python

Contents of listwrite.txt
Télévista S.A.Télévista S.A.Python
#

As seen above the file is being written in UTF-8 encoding and not Latin-1. How do I change it and override the default encoding for the file.

Edit: 2

Also, writing using a csv writer gives UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

Code below:

#!/usr/bin/python
import sys
import codecs
import csv

sh = [u'T\xe9l\xe9vista S.A.', u'T\xe9l\xe9vista S.A.', 'Python']
print sys.stdout.encoding
c = csv.writer(codecs.open('listwrite.txt', 'w', encoding='latin-1'), quoting=csv.QUOTE_NONE)
c.writerow(sh)
f.close()

for i in sh:
  print i.encode('latin-1')
Was it helpful?

Solution

I think you're attacking the problem from a wrong angle. Try encoding each row before writing instead:

import csv
sh = [u'T\xe9l\xe9vista S.A.', u'T\xe9l\xe9vista S.A.', 'Python']

f = open('listwrite.txt', 'wb') # binary mode
writer = csv.writer(f)
writer.writerow([item.encode('latin-1') for item in sh])
f.close()

Now you have a proper latin1-encoded file:

$ cat listwrite.txt | iconv -f latin1
Télévista S.A.,Télévista S.A.,Python
$ file listwrite.txt 
listwrite.txt: ISO-8859 text, with CRLF line terminators
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top