Question

I'm getting a

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 34: ordinal not in range(128)

on a string stored in 'a.desc' below as it contains the '£' character. It's stored in the underlying Google App Engine datastore as a unicode string so that's fine. The cStringIO.StringIO.writelines function is trying seemingly trying to encode it in ascii format:

result.writelines(['blahblah',a.desc,'blahblahblah'])

How do I instruct it to treat the encoding as unicode if that's the correct phrasing?

app engine runs on python 2.5

Was it helpful?

Solution

StringIO documentation:

Unlike the memory files implemented by the StringIO module, those provided by [cStringIO] are not able to accept Unicode strings that cannot be encoded as plain ASCII strings.

If possible, use StringIO instead of cStringIO.

OTHER TIPS

You can wrap the StringIO object in a codecs.StreamReaderWriter object to automatically encode and decode unicode.

Like this:

import cStringIO, codecs
buffer = cStringIO.StringIO()
codecinfo = codecs.lookup("utf8")
wrapper = codecs.StreamReaderWriter(buffer, 
        codecinfo.streamreader, codecinfo.streamwriter)

wrapper.writelines([u"list of", u"unicode strings"])

buffer will be filled with utf-8 encoded bytes.

If I understand your case correctly, you will only need to write, so you could also do:

import cStringIO, codecs
buffer = cStringIO.StringIO()
wrapper = codecs.getwriter("utf8")(buffer)

You can also encode your string as utf-8 manually before adding it to the StringIO

for val in rows:
    if isinstance(val, unicode):
        val = val.encode('utf-8')
result.writelines(rows)

Python 2.6 introduced the io module and you should consider using io.StringIO(), "An in-memory stream for unicode text."

In older python versions this is not optimized (pure Python), in later versions this has been optimized to (fast) C code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top