Question

I'm trying to clean up form input using the following Perl transliteration:

sub ValidateInput {
 my $input = shift;
 $input =~ tr/a-zA-Z0-9_@.:;',#$%&()\/\\{}[]?! -//cd;
 return $input;
}

The problem is that this transliteration is removing embedded newline characters that users may enter into a textarea field which I want to keep as part of the string. Any ideas on how I can update this to stop it from removing embedded newline characters? Thanks in advance for your help!

Was it helpful?

Solution 4

Thanks for the help guys! Ultimately I decided to process all the data in our database to remove the character that was causing the issue so that any text that was submitted via our update form (and not changed by the user) would match what was in the database. Per your suggestions I also added a few additional allowed characters to the validation regex.

OTHER TIPS

I'm not sure what you are doing, but I suspect you are trying to keep all the characters between the space and the tilde in the ASCII table, along with some of the whitespace characters. I think most of your list condenses to a single range \x20-\x7e:

$string =~ tr/\x0a\x0d\x20-\x7e//cd;

If you want to knock out a character like " (although I suspect you really want it since you allow the single quote), just adjust your range:

$string =~ tr/\x0a\x0d\x20-\xa7\xa9-\x7e//cd;

That's a bit of a byzantine way of doing it! If you add \012 it should keep the newlines.

$input =~ tr/a-zA-Z0-9_@.:;',#$%&()\/\{}[]?! \012-//cd;

See Form content types.

application/x-www-form-urlencoded: Line breaks are represented as "CR LF" pairs (i.e., %0D%0A).

...

multipart/form-data: As with all MIME transmissions, "CR LF" (i.e., %0D%0A) is used to separate lines of data.

I do not know what you have in the database. Now you know what your script it sees.

You are using CGI.pm, right?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top