It is correct to switch the default perl's IO to utf-8 while using Plack and Middlewares?
Question
Two starting points:
- In his answer to Why does modern Perl avoid UTF-8 by default? tchrist pointed out 52 things needed to ensure correct Unicode handling in Perl. The answer shows the boilerplate code with some
use
statements. A similiar question about the use of Unicode is How to make "use My::defaults" with modern perl & utf8 defaults? The PSGI spec is by design byte oriented. It is my responsibility to encode/decode everything, so for the Plack apps the correct way is to encode output and decode input, e.g.:
use Encode; my $app = sub { my $output = encode_utf8( myapp() ); return [ 200, [ 'Content-Type' =>'text/plain' ], [ $str ] ]; };
Is it correct to use
use uni::perl; # or any similar
in the PSGI application and/or in my modules?
uni::perl
changes Perl's default IO to UTF-8, thus:
use open qw(:std :utf8);
binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
Will doing so break something in Plack or its middlewares? Or is the only correct way to write apps for Plack explicitely encoding/decoding at open, so without the open
pragma?
Solution
You really don't want to set STDIN
/STDOUT
to be UTF-8 mode by default on Plack, because you don't know for instance whether they will be binary data transports. E.g. if those filehandles are the FastCGI protocol connector they will be carrying encoded binary structures and not UTF-8 text. They therefore must not have an encoding layer defined, or those binary structures will be mangled or rejected as invalid.
OTHER TIPS
On modern GNU/Linux systems you should completely switch to UTF-8 globally. This means setting
LANG="xx_YY.UTF-8"
PERL_UNICODE=SDAL
PERL5OPT=-Mutf8
in your /etc/environment
or /etc/sysconfig/i18n
or /etc/default/locale
or whatever your system configuration file is. Because of RHEL/Centos bug I symlinked /etc/environment
to sysconfig/i18n
.
Scripts that rely on binary input should set binmode
on STDIN/OUT/ERR(?) or use open
pragma or should be called with -C0
option.
The problem is that some DBD
drivers are buggy, e.g. DBD::JDBC, and you must set the utf8 flag by hand.
use Encode qw/_utf8_on/;
map { _utf8_on $_; } @strings;