Question

Continuing on from my previous question, I ran into another problem down the road. I realized not only should there be hashes within hashes, there can also be arrays within hashes. So the paths would be something like

one/two/three
one/two[]/three
one/two/four
one/two[]/four 

i.e. the hash is supposed to contain the array will always have a [] as its suffix. According to the script I'm using (slightly modified version of the answer of my previous question), the above paths would result in:

one => {
     two => {
         three => "",
         four => "",
     }
     two[] => [
         {
             three => "",
             four => "",
         }
     ]
}

The script I'm using is:

# !/usr/bin/perl

use Data::Dumper; 

sub insert {
  my ($ref, $head, @tail) = @_;
  if ( @tail ) { 
    if( $head !~ /^(.*)(\[\])$/ ) {
        insert( \%{$ref->{$head}}, @tail );
    } else {
        my %newhash = ();
        unshift(@{$ref->{$1 . $2}}, %newhash);
        insert( \%{$ref->{$1 . $2}[0]}, @tail );
    }
  } else {
    $ref->{$head} = '';
  }
}

my %hash;
chomp and insert \%hash, split( '/', $_ ) while <>;

print Dumper %hash;

What I'd like to do, is once I find two[], I'd like to delete two and add it to the array of two[] (if two exists) and then rename key two[] to two.

So the end result would look something like:

one => {
    two => [
        {
            three => "",
            four => "",
        },
        {
            three => "",
            four => "",
        }
    ]
}

So I tried adding checks within the if else for checking keys with or without [] suffixes but I got a range or errors like [$variable] is not a valid HASH reference, etc. How would one go about checking the type of the variable (eg $ref->{$head} is array?) and deleting and renaming keys of the hash efficiently?

Thanks.

Was it helpful?

Solution

Okay, by all rights, this should suck AND not do what you want - But I spent the last hour trying to get it somewhat right, so I'll be damned. Each 'anything[]' is an array of two elements, each a hashref: One for elements that appear after a bare 'anything', and the second for elements appearing after a 'anything[]'. I probably should have used a closure instead of relying on that crappy $is_non_bracket variable -- I'll take another look in the morning when I'm less retarded and more ashamed of writing this.

I think that it's tail-call optimized (the goto &SUB part). It also makes (small) use of named captures.

use strict;
use warnings;
use 5.010;
use Data::Dumper;

sub construct {
    my $node = shift;
    return unless @_;
    my $next           = shift;
    my $is_non_bracket = 1;

    $next .= '[]' and $is_non_bracket-- if exists $node->{ $next . '[]' };
    if ( $next =~ / (?<node>[^\[\]]+) \Q[]/x ) {
        if ( exists $node->{ $+{node} } or not defined( $node->{$next} ) ) {
            push @{ $node->{$next} }, (delete $node->{ $+{node} } // {}); #/
         }
         unshift @_, $node->{$next}->[$is_non_bracket] ||= {};
    }
    else {
        $node->{$next} ||= @_ ? {} : $node->{$next};
        unshift @_, $node->{$next} //= @_ ? {} : ''; #/
    }
    goto &construct;
}


my %hash;

while (<DATA>) {
    chomp;
    construct( \%hash, split m!/! );
}

say Dumper \%hash;

__DATA__
one/two/three
one/two[]/three
one/two[]/three/four
one/two[]/three/four/five[]
one/two[]/three/four/whatever
one/two/ELEVAN
one/three/sixteen
one/three[]/whygodwhy
one/three/mrtest/mruho
one/three/mrtest/mruho[]/GAHAHAH

EDIT: Regex had an extra space after the quotemeta that made it break down; My bad.

EDIT2: Okay, it's the morning, edited in a version that isn't so stupid. No need for the ref, as we always pass a hashref; The #/ are there to stop the //'s from borking the highlighting.

EDIT3: Just noticed you DON'T want those [] to show up in the data structure, so here's a version that doesn't show them:

sub construct {
    my $node = shift;
    return unless @_;
    my $is_bracket = (my $next = shift) =~ s/\Q[]// || 0; 

    if (ref $node->{$next} eq 'ARRAY' or $is_bracket) {
        if ( ref $node->{ $next } ne 'ARRAY' ) {
            my $temp = delete $node->{ $next } || {};
            push @{ $node->{$next} = [] }, $temp;
         }
         unshift @_, $node->{$next}->[$is_bracket] ||= {};
    }
    else {
        $node->{$next} ||= @_ ? {} : $node->{$next};
        unshift @_, $node->{$next} //= @_ ? {} : ''; #/
    }
    goto &construct;
}

EDITNaN: Here's the gist of what it does: If there are enough arguments, we shift for a second time and put the value in the $next, which is promptly pulled into a substitution, which takes away its [], should it have any: If it does, the substitution returns 1, otherwise, s/// returns undef (or the empty string, I forget), so we use the logical-or to set the return value to 0; Either way, we set $is_bracket to this.

Afterwards, if $node->{$next} is an arrayref or $next had brackets: If $node->{$next} wasn't an arrayref (so we got here because $next had brackets, and it was the first time this has happened), it' either undef, the empty string, or a hashref; We delete whatever it is, and store it in $temp. We then set the now-empty $node->{$next} to an arrayref, and set(push) $temp as its first element - Meaning that, for instance, if 'two' had existed previous, and $next was originally 'two[]', then 'two' will now point to an arrayref, and its old value will be stored in [0]. Once $node->{$next} is an arrayref (or if it already was), we unshift the hashref in the index pointed by $is_backet - 0 if $next didn't have brackets, and 1 if it did - to @_. If the hashref doesn't exist (either because it's undef, for both, or possibly the empty string, for 0), we assign it an brand new hashref with the logical-or.

If it wasn't an arrayref, it's a hashref, so we do the same thing as before, and unshift the resulting value to @_.

We do all this unshifting because the magic goto passes our current @_ to the function that will replace us.

OTHER TIPS

I'm not sure by what logic you reach your expected output, but I can clarify the following:

  • You can check the type of the reference by using the ref function.
  • In your current code, $ref is treated as a hash reference in every single case. I can tell because you dereference it as a hash using the $ref->{...} syntax in every single clause of your if statements.
  • If I read you correctly, $1 . $2 should be the same thing as $head. I find it more clear as just $head.
  • Your unshift line vivifies $ref->{$1 . $2} as an array reference (to an explicit empty array). And the next line vivifies its first element as a hash reference. The first line seems superfluous to me, you'd get the same results with the sole insert line; so I'm not sure of the intent.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top