Your list of "everything" does not seem to include the obvious $line =~ m/\000/
.
How do I omit lines that contain Unicode NULL (U+0000)?
Domanda
I am reading a file and am wondering how to skip lines that have Unicode NULL, U+0000? I have tried everything below, but none works:
if($line)
chomp($line)
$line =~ s/\s*$//g;
Soluzione
Altri suggerimenti
Because you asked about Unicode NULL (identical to ASCII NUL when encoded in UTF-8), let’s use the \N{U+...}
form, described in the perlunicode documentation.
Unicode characters can also be added to a string by using the
\N{U+...}
notation. The Unicode code for the desired character, in hexadecimal, should be placed in the braces, after theU
. For instance, a smiley face is\N{U+263A}
.
You can also match against \N{U+...}
in regexes. See below.
#! /usr/bin/env perl
use strict;
use warnings;
my $contents =
"line 1\n" .
"\N{U+0000}\n" .
"foo\N{U+0000}bar\n" .
"baz\N{U+0000}\n" .
"\N{U+0000}quux\n" .
"last\n";
open my $fh, "<", \$contents or die "$0: open: $!";
while (defined(my $line = <$fh>)) {
next if $line =~ /\N{U+0000}/;
print $line;
}
Output:
$ ./filter-nulls line 1 last
Perl strings can contain arbitrary data, including NUL characters. Your if
only checks for true or false (where ""
and "0"
are the two false strings, everything else being true including a string containing a single NUL "\x00"). Your chomp
only removes the line separator, not NULs. A NUL character is not whitespace, so doesn't match \s
.
You can explicitly match a NUL character by specifying it in a regex using octal or hex notation ("\000"
or "\x00"
, respectively).