What does $dummy and non-parameter split mean in Perl?
-
18-03-2021 - |
Вопрос
I need some help decoding this perl script. $dummy is not initialized with anything throughout anywhere else in the script. What does the following line mean in the script? and why does it mean when the split function doesn't have any parameter?
($dummy, $class) = split;
The program is trying to check whether a statement is truth or lie using some statistical classification method. So lets say it calculates and give the following number to "truth-sity" and "falsity" then it checks whether the lie detector is correct or not.
# some code, some code...
$_ = "truth"
# more some code, some code ...
$Truthsity = 9999
$Falsity = 2134123
if ($Truthsity > $Falsity) {
$newClass = "truth";
} else {
$newClass = "lie";
}
($dummy, $class) = split;
if ($class eq $newClass) {
print "correct";
} elsif ($class eq "true") {
print "false neg";
} else {
print "false pos"
}
Решение
($dummy, $class) = split;
Split returns an array of values. The first is put into $dummy
, the second into $class
, and any further values are ignored. The first arg is likely named dummy because the author plans to ignore that value. A better option is to use undef to
ignore a returned entry: ( undef, $class ) = split;
Perldoc can show you how split functions. When called without arguments, split will operate against $_
and split on whitespace. $_
is the default variable in perl, think of it as an implied "it," as defined by context.
Using an implied $_ can make short code more concise, but it's poor form to use it inside larger blocks. You don't want the reader to get confused about which 'it' you want to work with.
split ; # split it
for (@list) { foo($_) } # look at each element of list, foo it.
@new = map { $_ + 2 } @list ;# look at each element of list,
# add 2 to it, put it in new list
while(<>){ foo($_)} # grab each line of input, foo it.
perldoc -f split
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.)
I'm a big fan of the ternary operator ? :
for setting string values and of pushing logic into blocks and subroutines.
my $Truthsity = 9999
my $Falsity = 2134123
print test_truthsity( $Truthsity, $Falsity, $_ );
sub test_truthsity {
my ($truthsity, $falsity, $line ) = @_;
my $newClass = $truthsity > $falsity ? 'truth' : 'lie';
my (undef, $class) = split /\s+/, $line ;
my $output = $class eq $newClass ? 'correct'
: $class eq 'true' ? 'false neg'
: 'false pos';
return $output;
}
There may be a subtle bug in this version. split
with no args is not the exactly the same as split(/\s+/, $_)
, they behave differently if the line starts with spaces. In fully qualified split, blank leading fields are returned. split
with no args drops the leading spaces.
$_ = " ab cd";
my @a = split # @a contains ( 'ab', 'cd' );
my @b = split /\s+/, $_; # @b contains ( '', 'ab', 'cd')
Другие советы
From the documentation for split
:
split /PATTERN/,EXPR
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.)
So since both the pattern and the expression are omitted, we are splitting the default variable $_
on whitespace.
The purpose of the $dummy
variable is to capture the first element of the list returned from split and ignore it, because the code is only interested in the second element, which gets put into $class
.
You'll have to look at the surrounding code to find out what $_
is in this context; it may be a loop variable or a list item in a map
block, or something else.
If you read the documentation, you'll find that:
- The default for the first operand is
" "
. - The default for the second operand is
$_
. - The default for the third operand is
0
.
so
split
is short for
split " ", $_, 0
and it means:
Take $_, split its value on whitespace, ignoring leading and trailing whitespace.
The first resulting field is placed in $dummy
, and the second in $class
.
Based on its name, I presume you proceed to never use $dummy
again, so it's simply acting as a placeholder. You can get rid of it, though.
my ($dummy, $class) = split;
can be written as
my (undef, $class) = split; # Use undef as a placeholder
or
my $class = ( split )[1]; # Use a list slice to get second item