Question

Hi StackOverflow community,

I want to wrap the substitution function in Perl using a script - that is, I want to have a script that receive the input and output characters as its arguments, plus the text file in where I want to do the replacement. I have everything covered up, how to handle the file, how to iterate through each line, how to handle the arguments, etc... The only thing that is not working is how the second argument gets printed "literally" into the resulting file, not being interpreted as the character represented by the octal code I'm passing as an argument.

Maybe it's more clear to explain using an example. Consider the following file:

Ross1 0    HP  01/11/2014^M
St J1 0    PA  01/15/2014^M
Gree1 0    GT  01/15/2014^M
UNKN1 0    HPHP01/13/2014^M
Wayn1 0    GT  01/15/2014^M

A specific example would be to replace the end of lines (^M) - in this case, CR: Carriage Return with octal code = 15 - with an empty character (DEL, with octal code = 177). In other words, I will be using Perl's s function to remove the end of lines.

A general example would be to replace any character with any other character, just passing the octal codes as arguments into the script. I have pretty much everything in place, as mentioned above, but I'm facing an issue that I guess it's because of how the codes are escaped. I'm fairly new to Perl, so probably I'm missing something really simple...

I made this script, that set the arguments as:

my ($parm1, $parm2, $filename) = @ARGV;

And substitutes the characters using the s function, like this:

 $_ =~ s/$parm1/$parm2/g;

Of course, line by line, using something like:

while (<INPUT>)
{
    chomp($_);    
    $_ =~ s/$parm1/$parm2/g;
    print OUTPUT $_."\n";
}

So, let's see what happens when I test the script:

Bad:

$ script.pl "\15" "\177" text

$ cat -v text
Ross1 0    HP  01/11/2014\177
St J1 0    PA  01/15/2014\177
Gree1 0    GT  01/15/2014\177
UNKN1 0    HPHP01/13/2014\177
Wayn1 0    GT  01/15/2014\177

Passing the arguments as octal codes is good for the first argument, but I have no idea why the string gets replaced as-is instead of replacing it with the correspondent character represented by the octal code (\177).

Good:

$ script.pl "\15" "" text

$ cat -v text
Ross1 0    HP  01/11/2014
St J1 0    PA  01/15/2014
Gree1 0    GT  01/15/2014
UNKN1 0    HPHP01/13/2014
Wayn1 0    GT  01/15/2014

If instead of passing the octal code as the second argument I pass the actual character (empty, or nothing, as I want to strip off the ^M from there), the script works as intended.

The same happens regardless of what character I want to replace - that is, wherever character I send as the second argument using its octal code.

Am I missing something? For sure... but what?

Thank you for reading - I appreciate any ideas or suggestions from you guys.

Best regards


Edit: Just in case, perl -v: This is perl, v5.8.8 built for aix-thread-multi...


Edit: I found info regarding the octal escapes in here: http://perldoc.perl.org/perlrebackslash.html

It has something to do here for sure. However, even after changing the code to something like $_ =~ s/\o{$parm1}/\o{$parm2}/g; and passing just the numbers into the script it's still not working.

Was it helpful?

Solution

I found the problem while reading through Perl's documentation... specifically, this page: http://docstore.mik.ua/orelly/perl2/prog/ch05_02.htm

Under 5.2.3. The s/// Operator (Substitution), you can see the following paragraph:

s/PATTERN/REPLACEMENT/egimosx

...

This operator searches a string for PATTERN and, if found, replaces the matched substring with the REPLACEMENT text.

...

The replacement portion is treated as a double-quoted string.

So, that's why it wasn't evaluating the \codes in the replacement section - the codes are treated as literal strings, so it makes sense that they were written as-is in the file...

To workaround the problem, I passed the decimal values of the characters I wanted to replace, and replaced them in the script like:

$char_parm1 = chr($parm1);

So, when running the script to replace @ with !, I do:

script.pl "64" "33" text

And the substitution operator was defined like this:

$_ =~ s/$char_parm1/$char_parm2/g;

There are special cases for some characters, like \ and ., as they are used as escape characters or as part of regular expressions... but besides that, the general form for the script is like:

$char_parm1 = chr($parm1);
$char_parm2 = chr($parm2);

while (<INPUT>)
{
    chomp($_);
    $_ =~ s/$char_parm1/$char_parm2/g;

    print OUTPUT $_."\n";
}

I hope this helps - it helped me to learn something new, indeed :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top