How to convert an array into a hash, with variable names mapped as keys in Perl?

https://stackoverflow.com/questions/9913894

27-05-2021
|

Question

I find myself doing this pattern a lot in perl

sub fun {
    my $line = $_[0];
    my ( $this, $that, $the_other_thing ) = split /\t/, $line;
    return { 'this' => $this, 'that' => $that, 'the_other_thing' => $the_other_thing};
}

Obviously I can simplify this pattern by returning the output of a function which transforms a given array of variables into a map, where the keys are the same names as the variables eg

sub fun {
    my $line = $_[0];
    my ( $this, $that, $the_other_thing ) = split /\t/, $line;
    return &to_hash( $this, $that, $the_other_thing );
}

It helps as the quantity of elements get larger. How do I do this? It looks like I could combine PadWalker & closures, but I would like a way to do this using only the core language.

EDIT: thb provided a clever solution to this problem, but I've not checked it because it bypasses a lot of the hard parts(tm). How would you do it if you wanted to rely on the core language's destructuring semantics and drive your reflection off the actual variables?

EDIT2: Here's the solution I hinted at using PadWalker & closures:

use PadWalker qw( var_name );

# Given two arrays, we build a hash by treating the first set as keys and
# the second as values
sub to_hash {
    my $keys = $_[0];
    my $vals = $_[1];
    my %hash;
    @hash{@$keys} = @$vals;
    return \%hash;
}

# Given a list of variables, and a callback function, retrieves the
# symbols for the variables in the list.  It calls the function with
# the generated syms, followed by the original variables, and returns
# that output.
# Input is: Function, var1, var2, var3, etc....
sub with_syms {
    my $fun = shift @_;
    my @syms = map substr( var_name(1, \$_), 1 ), @_;
    $fun->(\@syms, \@_);
}

sub fun {
    my $line = $_[0];
    my ( $this, $that, $other) = split /\t/, $line;
    return &with_syms(\&to_hash, $this, $that, $other);
}

Solution

You could use PadWalker to try to get the name of the variables, but that's really not something you should do. It's fragile and/or limiting.

Instead, you could use a hash slice:

sub fun {
   my ($line) = @_;
   my %hash;
   @hash{qw( this that the_other_thing )} = split /\t/, $line;
   return \%hash;
}

You can hide the slice in a function to_hash if that's what you desire.

sub to_hash {
   my $var_names = shift;
   return { map { $_ => shift } @$var_names };
}

sub fun_long {
   my ($line) = @_;
   my @fields = split /\t/, $line;
   return to_hash [qw( this that the_other_thing )] @fields;
}

sub fun_short {
   my ($line) = @_;
   return to_hash [qw( this that the_other_thing )], split /\t/, $line;
}

But if you insist, here's the PadWalker version:

use Carp      qw( croak );
use PadWalker qw( var_name );

sub to_hash {
   my %hash;
   for (0..$#_) {
      my $var_name = var_name(1, \$_[$_])
         or croak("Can't determine name of \$_[$_]");
      $hash{ substr($var_name, 1) } = $_[$_];
   }
   return \%hash;
}

sub fun {
   my ($line) = @_;
   my ($this, $that, $the_other_thing) = split /\t/, $line;
   return to_hash($this, $that, $the_other_thing);
}

OTHER TIPS

This does it:

my @part_label = qw( part1 part2 part3 );

sub fun {
    my $line = $_[0];
    my @part = split /\t/, $line;
    my $no_part = $#part_label <= $#part ? $#part_label : $#part;
    return map { $part_label[$_] => $part[$_] } (0 .. $no_part);
}

Of course, your code must name the parts somewhere. The code above does it by qw(), but you can have your code autogenerate the names if you like.

[If you anticipate a very large list of *part_labels,* then you should probably avoid the *(0 .. $no_part)* idiom, but for lists of moderate size it works fine.]

Update in response to OP's comment below: You pose an interesting challenge. I like it. How close does the following get to what you want?

sub to_hash ($$) {
    my @var_name = @{shift()};
    my @value    = @{shift()};
    $#var_name == $#value or die "$0: wrong number of elements in to_hash()\n";
    return map { $var_name[$_] => $value[$_] } (0 .. $#var_name);
}

sub fun {
    my $line = $_[0];
    return to_hash [qw( this that the_other_thing )], [split /\t/, $line];
}

If I understand you properly you want to build a hash by assigning a given sequence of keys to values split from a data record.

This code seems to do the trick. Please explain if I have misunderstood you.

use strict;
use warnings;

use Data::Dumper;
$Data::Dumper::Terse++;

my $line = "1111 2222 3333 4444 5555 6666 7777 8888 9999\n";

print Dumper to_hash($line, qw/ class division grade group kind level rank section tier  /);

sub to_hash {
  my @fields = split ' ', shift;
  my %fields = map {$_ => shift @fields} @_;
  return \%fields;
}

output

{
  'division' => '2222',
  'grade' => '3333',
  'section' => '8888',
  'tier' => '9999',
  'group' => '4444',
  'kind' => '5555',
  'level' => '6666',
  'class' => '1111',
  'rank' => '7777'
}

For a more general solution which will build a hash from any two lists, I suggest the zip_by function from List::UtilsBy

use strict;
use warnings;

use List::UtilsBy qw/zip_by/;
use Data::Dumper;
$Data::Dumper::Terse++;

my $line = "1111 2222 3333 4444 5555 6666 7777 8888 9999\n";

my %fields = zip_by { $_[0] => $_[1] }
    [qw/ class division grade group kind level rank section tier  /],
    [split ' ', $line];

print Dumper \%fields;

The output is identical to that of my initial solution.

See also the pairwise function from List::MoreUtils which takes a pair of arrays instead of a list of array references.

Aside from parsing the Perl code yourself, a to_hash function isn't feasible using just the core language. The function being called doesn't know whether those args are variables, return values from other functions, string literals, or what have you...much less what their names are. And it doesn't, and shouldn't, care.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow