How can I obtain correct non-ASCII command-line arguments in ActiveState Perl?

https://stackoverflow.com/questions/7824335

27-10-2019
|

题

Running the following command

perl -e "for (my $i = 0; $i < length($ARGV[0]); $i++) {print ord(substr($ARGV[0], $i, 1)), qq{\n}; }" αβγδεζ

on a Windows 7 cmd window with ActiveState Perl v5.14.2 produces the following result:

The above values are nonsensical and don't correspond to any known encoding, so trying to decode them with the approach recommended in How can I treat command-line arguments as UTF-8 in Perl? doesn't help. Changing the command window active code page doesn't change the results.

解决方案

Your system, like every Windows system I know, uses by default the 1252 ANSI code page, so you could try to use

use Encode qw( decode );
@ARGV = map { decode('cp1252', $_) } @ARGV;

Note that cp1252 cannot represent all of those characters, which is why the console and thus Perl actually receives

a 97
ß 223
? 63
d 100
e 101
? 63

There is a "Wide" interface for passing (almost) any Unicode code point to a program, but

The Wide interface is not used when you type in a command at the prompt.
Perl uses the ANSI interface to fetch the parameters, so even if you started Perl using the Wide interface, the parameters would get downgraded to ANSI when Perl fetches them.

Sorry, but this is a "you can't" type of situation. You need a different approach. Diomidis Spinellis suggests changing your system's ANSI code page as follows in Win7:

Control Panel
Region and Language
Administrative
Language for non-Unicode programs
Set the Current language for non-Unicode programs to the language associated with the specific characters (Greek in your case).

At this point, you'd use the encoding of the ANSI code page associated with the new selected encoding instead of cp1252 (cp1253 for Greek).

use Encode qw( decode );
@ARGV = map { decode('cp1253', $_) } @ARGV;

Note that using chcp to modify the code page used within the console window does not affect the code page in which Perl receives its arguments, which is always an ANSI code page. See the examples below (cp737 is the Greek OEM code page, and cp1253 is the Greek ANSI code page. You can find the encodings labeled as 37 and M7 in this document.)

C:\>chcp 737
Active code page: 737

C:\>echo αβγδεζ | od -t x1
0000000 98 99 9a 9b 9c 9d 20 0d 0a

C:\>perl -e "print map sprintf('%x ', ord($_)), split(//, $ARGV[0])" αβγδεζ
e1 e2 e3 e4 e5 e6

C:\>chcp 1253
Active code page: 1253

C:\>echo αβγδεζ | od -t x1
0000000 e1 e2 e3 e4 e5 e6 20 0d 0a

C:\>perl -e "print map sprintf('%x ', ord($_)), split(//, $ARGV[0])" αβγδεζ
e1 e2 e3 e4 e5 e6

其他提示

This worked for me (on OS-X, but should be portable):

echo  αβγδεζ |perl -CI -e "chomp($in=<STDIN>);for (my $i = 0; $i < length($in); $i++) {print ord(substr($in, $i, 1)), qq{\n}; }"

That was for STDIN; for ARGV:

perl -CA -e "for (my $i = 0; $i < length($ARGV[0]); $i++) {print ord(substr($ARGV[0], $i, 1)), qq{\n}; }" αβγδεζ

See the -C option in perlrun: http://perldoc.perl.org/perlrun.html#Command-Switches

If I place the characters in a file (from OS-X), copy it to a windows box (as file.txt), then run:

perl -CI -e "chomp($_=<STDIN>); map{print ord, qq{\n}} split(//)" < file.txt

Then I get the expected:

But if I copy the contents of file.txt to the command line, I get gibberish.

As @ikegami was saying, I don't think it's possible to do from command line since you don't have a UTF-8 locale.

You could try using https://metacpan.org/pod/Win32::Unicode::Native. It should have what you need.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow