I think it's best if you don't use
:raw
. You are processing text, so you should properly decode and encode. That will be far less error prone, and it will allow your parser to use predefined character classes if you so desire.You parse as if you expect slashes in the literal, but then you completely ignore then when you escape. Because of that, you could end up with
"...\\xC3\xA3..."
. Working with decoded text will also help here.
So forget "perlish"; let's actually fix the bugs.
use open ':std', ':locale';
sub convert_char {
my ($s) = @_;
utf8::encode($s);
$s = uc unpack 'H*', $s;
$s =~ s/\G(..)/\\x$1/sg;
return $s;
}
sub convert_literal {
my $orig = my $s = substr($_[0], 1, -1);
my $safe = '\x20-\x7E'; # ASCII printables and space
my $safe_no_slash = '\x20-\x5B\x5D-\x7E'; # ASCII printables and space, no \
my $changed = $s =~ s{
(?: \\? ( [^$safe] )
| ( (?: [$safe_no_slash] | \\[$safe] )+ )
)
}{
defined($1) ? convert_char($1) : $2
}egx;
# XXX Assumes $orig doesn't contain "*/"
return qq{"$s"} . ( $changed ? " /* $orig */" : '' );
}
while (<>) {
s/(" (?:[^"\\]++|\\.)*+ ")/ convert_literal($1) /segx;
print;
}