Question

I'm running python 2.6 with latest IPython on Windows XP SP3, and I have two questions. First one of my problems is, when under IPython, I cannot input Unicode strings directly, and, as a result, cannot open files with non-latin names. Let me demonstrate. Under usual python this works:

>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'mbcs'
>>> fd = open(u'm:/Блокнот/home.tdl')
>>> print u'm:/Блокнот/home.tdl'
m:/Блокнот/home.tdl
>>>

It's cyrillic in there, by the way. And under the IPython I get:

In [49]: sys.getdefaultencoding()
Out[49]: 'ascii'

In [50]: sys.getfilesystemencoding()
Out[50]: 'mbcs'

In [52]: fd = open(u'm:/Блокнот/home.tdl')
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)

C:\Documents and Settings\andrey\<ipython console> in <module>()

IOError: [Errno 2] No such file or directory: u'm:/\x81\xab\xae\xaa\xad\xae\xe2/home.tdl'

In [53]: print u'm:/Блокнот/home.tdl'
-------------->print(u'm:/Блокнот/home.tdl')
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (15, 0))

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)

C:\Documents and Settings\andrey\<ipython console> in <module>()

C:\Program Files\Python26\lib\encodings\cp866.pyc in encode(self, input, errors)
     10
     11     def encode(self,input,errors='strict'):
---> 12         return codecs.charmap_encode(input,errors,encoding_map)
     13
     14     def decode(self,input,errors='strict'):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 3-9: character maps to <und

In [54]:

The second problem is less frustrating, but still. When I try to open a file, and specify file name argument as non-unicode string, it does not open. I have to forcibly decode string from OEM charset, before I could open files, which is pretty inconvenient:

>>> fd2 = open('m:/Блокнот/home.tdl'.decode('cp866'))
>>>

Maybe it has something to with my regional settings, I don't know, because I can't even cut-and-paste cyrillic text from console. I've put "Russian" everywhere in regional settings, but it does not seem to work.

Was it helpful?

Solution

Yes. Typing Unicode at the console is always problematic and generally best avoided, but IPython is particularly broke. It converts characters you type on its console as if they were encoded in ISO-8859-1, regardless of the actual encoding you're giving it.

For now, you'll have to say u'm:/\u0411\u043b\u043e\u043a\u043d\u043e\u0442/home.tdl'.

OTHER TIPS

Perversely enough, this will work:

fd = open('m:/Блокнот/home.tdl')

Or:

fd = open('m:/Блокнот/home.tdl'.encode('utf-8'))

This gets around ipython's bug by inputting the string as a raw UTF-8 encoded byte-string. ipython doesn't try any funny business with it. You're then free to encode it into a unicode string if you like, and get on with your life.

I had the same problem with Greek input, this patch from launchpad works for me too.

Thanks.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top