Python: How to get StringIO.writelines to accept unicode string?
Question
I'm getting a
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 34: ordinal not in range(128)
on a string stored in 'a.desc' below as it contains the '£' character. It's stored in the underlying Google App Engine datastore as a unicode string so that's fine. The cStringIO.StringIO.writelines function is trying seemingly trying to encode it in ascii format:
result.writelines(['blahblah',a.desc,'blahblahblah'])
How do I instruct it to treat the encoding as unicode if that's the correct phrasing?
app engine runs on python 2.5
Solution
Unlike the memory files implemented by the StringIO module, those provided by [cStringIO] are not able to accept Unicode strings that cannot be encoded as plain ASCII strings.
If possible, use StringIO instead of cStringIO.
OTHER TIPS
You can wrap the StringIO object in a codecs.StreamReaderWriter
object to automatically encode and decode unicode.
Like this:
import cStringIO, codecs
buffer = cStringIO.StringIO()
codecinfo = codecs.lookup("utf8")
wrapper = codecs.StreamReaderWriter(buffer,
codecinfo.streamreader, codecinfo.streamwriter)
wrapper.writelines([u"list of", u"unicode strings"])
buffer
will be filled with utf-8 encoded bytes.
If I understand your case correctly, you will only need to write, so you could also do:
import cStringIO, codecs
buffer = cStringIO.StringIO()
wrapper = codecs.getwriter("utf8")(buffer)
You can also encode your string as utf-8 manually before adding it to the StringIO
for val in rows:
if isinstance(val, unicode):
val = val.encode('utf-8')
result.writelines(rows)
Python 2.6 introduced the io
module and you should consider using io.StringIO()
, "An in-memory stream for unicode text."
In older python versions this is not optimized (pure Python), in later versions this has been optimized to (fast) C code.