Domanda

I am reading a file and am wondering how to skip lines that have Unicode NULL, U+0000? I have tried everything below, but none works:

  • if($line)
  • chomp($line)
  • $line =~ s/\s*$//g;
È stato utile?

Soluzione

Your list of "everything" does not seem to include the obvious $line =~ m/\000/.

Altri suggerimenti

Because you asked about Unicode NULL (identical to ASCII NUL when encoded in UTF-8), let’s use the \N{U+...} form, described in the perlunicode documentation.

Unicode characters can also be added to a string by using the \N{U+...} notation. The Unicode code for the desired character, in hexadecimal, should be placed in the braces, after the U. For instance, a smiley face is \N{U+263A}.

You can also match against \N{U+...} in regexes. See below.

#! /usr/bin/env perl

use strict;
use warnings;

my $contents =
  "line 1\n" .
  "\N{U+0000}\n" .
  "foo\N{U+0000}bar\n" .
  "baz\N{U+0000}\n" .
  "\N{U+0000}quux\n" .
  "last\n";

open my $fh, "<", \$contents or die "$0: open: $!";

while (defined(my $line = <$fh>)) {
  next if $line =~ /\N{U+0000}/;
  print $line;
}

Output:

$ ./filter-nulls
line 1
last

Perl strings can contain arbitrary data, including NUL characters. Your if only checks for true or false (where "" and "0" are the two false strings, everything else being true including a string containing a single NUL "\x00"). Your chomp only removes the line separator, not NULs. A NUL character is not whitespace, so doesn't match \s.

You can explicitly match a NUL character by specifying it in a regex using octal or hex notation ("\000" or "\x00", respectively).

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top