Question

After a few days of dwelling over stackoverflow and python 2.7 doc, I have come to no conclusion about this.

Basically I'm running a python script on a windows server that must have as input a block of text. This block of text (unfortunately) has to be passed by a pipe. Something like:

PS > [something_that_outputs_text] | python .\my_script.py

So the problem is:

The server uses cp1252 encoding and I really cannot change it due to administrative regulations and whatnot. And when I pipe the text to my python script, when I read it, it comes already with ? whereas characters like \xe1 should be.

What I have done so far:

Tested with UTF-8. Yep, chcp 65001 and $OutputEncoding = [Console]::OutputEncoding "solve it", as in python gets the text perfectly and then I can decode it to unicode etc. But apparently they don't let me do it on the server /sadface.

A little script to test what the hell is happening:

import codecs
import sys

def main(argv=None):
    if argv is None:
        argv = sys.argv
        if len(argv)>1:
            for arg in argv[1:]:
                print arg.decode('cp1252')

    sys.stdin = codecs.getreader('cp1252')(sys.stdin)
    text = sys.stdin.read().strip()
    print text
    return 0

if __name__=="__main__":
    sys.exit(main())

Tried it with both the codecs wrapping and without it.

My input & output:

PS > echo "Blá" | python .\testinput.py blé
blé
Bl?

--> So there's no problem with the argument (blé) but the piped text (Blá) is no good :(

I even converted the text string to hex and, yes, it gets flooded with 3f (AKA mr ?), so it's not a problem with the print.

[Also: it's my first question here... feel free to ask any more info about what I did]

EDIT

I don't know if this is relevant or not, but when I do sys.stdin.encoding it yields None

Update: So... I have no problems with cmd. Checked sys.stdin.encoding while running the program on cmd and everything went fine. I think my head just exploded.

Was it helpful?

Solution

How about saving the data into a file and piping it to Python on a CMD session? Invoke Powershell and Python on CMD. Like so,

c:\>powershell -command "c:\genrateDataForPython.ps1 -output c:\data.txt"
c:\>type c:\data.txt | python .\myscript.py

Edit

Another an idea: convert the data into base64 format in Powershell and decode it in Python. Base64 is simple in Powershell, I guess in Python it isn't hard either. Like so,

# Convert some accent chars to base64
$s  = [Text.Encoding]::UTF8.GetBytes("éêèë")
[System.Convert]::ToBase64String($s)
# Output:
w6nDqsOow6s=

# Decode:
$d  = [System.Convert]::FromBase64String("w6nDqsOow6s=")
[Text.Encoding]::UTF8.GetString($d)
# Output
éêèë
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top