Question

I need some help decoding this perl script. $dummy is not initialized with anything throughout anywhere else in the script. What does the following line mean in the script? and why does it mean when the split function doesn't have any parameter?

($dummy, $class) = split;

The program is trying to check whether a statement is truth or lie using some statistical classification method. So lets say it calculates and give the following number to "truth-sity" and "falsity" then it checks whether the lie detector is correct or not.

# some code, some code...
$_ = "truth"
# more some code, some code ...

$Truthsity = 9999
$Falsity = 2134123

if ($Truthsity > $Falsity) {   
    $newClass = "truth";      
} else {
    $newClass = "lie";     
}

($dummy, $class) = split;

if ($class eq $newClass) {
    print "correct";
} elsif ($class eq "true") {
    print "false neg";
} else {
    print "false pos"
}
Was it helpful?

Solution

($dummy, $class) = split;

Split returns an array of values. The first is put into $dummy, the second into $class, and any further values are ignored. The first arg is likely named dummy because the author plans to ignore that value. A better option is to use undef to ignore a returned entry: ( undef, $class ) = split;

Perldoc can show you how split functions. When called without arguments, split will operate against $_ and split on whitespace. $_ is the default variable in perl, think of it as an implied "it," as defined by context.

Using an implied $_ can make short code more concise, but it's poor form to use it inside larger blocks. You don't want the reader to get confused about which 'it' you want to work with.

split ;                      # split it
for (@list) { foo($_) }      # look at each element of list, foo it.
@new = map { $_ + 2 } @list ;# look at each element of list, 
                             # add 2 to it, put it in new list
while(<>){ foo($_)}          # grab each line of input, foo it.

perldoc -f split

If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.)

I'm a big fan of the ternary operator ? : for setting string values and of pushing logic into blocks and subroutines.

my $Truthsity = 9999
my $Falsity   = 2134123

print test_truthsity( $Truthsity, $Falsity, $_ );

sub test_truthsity {
  my ($truthsity, $falsity, $line ) = @_;
  my $newClass = $truthsity > $falsity ? 'truth' : 'lie';
  my (undef, $class) = split /\s+/, $line ;

  my $output = $class eq $newClass ? 'correct' 
             : $class eq 'true'    ? 'false neg'
             :                       'false pos';
  return $output;
}

There may be a subtle bug in this version. split with no args is not the exactly the same as split(/\s+/, $_), they behave differently if the line starts with spaces. In fully qualified split, blank leading fields are returned. split with no args drops the leading spaces.

$_ = "  ab cd";
my @a = split             # @a contains ( 'ab', 'cd' );
my @b = split /\s+/, $_;  # @b contains ( '', 'ab', 'cd')

OTHER TIPS

From the documentation for split:

split /PATTERN/,EXPR

If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.)

So since both the pattern and the expression are omitted, we are splitting the default variable $_ on whitespace.

The purpose of the $dummy variable is to capture the first element of the list returned from split and ignore it, because the code is only interested in the second element, which gets put into $class.

You'll have to look at the surrounding code to find out what $_ is in this context; it may be a loop variable or a list item in a map block, or something else.

If you read the documentation, you'll find that:

  • The default for the first operand is " ".
  • The default for the second operand is $_.
  • The default for the third operand is 0.

so

split

is short for

split " ", $_, 0

and it means:

Take $_, split its value on whitespace, ignoring leading and trailing whitespace.

The first resulting field is placed in $dummy, and the second in $class.

Based on its name, I presume you proceed to never use $dummy again, so it's simply acting as a placeholder. You can get rid of it, though.

my ($dummy, $class) = split;

can be written as

my (undef, $class) = split;   # Use undef as a placeholder

or

my $class = ( split )[1];     # Use a list slice to get second item
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top