Question

I'm trying to compare the contents of a file with a Hash of hashes. For this I am using map, if and exists, though ineffectively yet.

Basically, I want to know if columns 0 to 2 of the file exist in a hash. If so, then I want to find whether column 3 exists as key in the inner hash. My "old file.txt" is a tab separated file from which I produce the following hash:

old file.txt:

A    s.av    u
B    s.bv    u
C    s.av    u
C    s.cv    m

Hash:

my %oldhash = {
  'A' => {'s.av' => ['u']},
  'B' => {'s.bv' => ['u']},
  'C' => {'s.av' => ['u'], 's.cv' => ['m']},
};

Look if the following tab-separated columns from "new file.txt" exist in the hash:

D    Db    Dc    s.av   #cols 0 - 2 do not exist in hash
E    A     Ab    d.ef   #column 1 exists, but column 3 doesn't, so nothing is done
E    A     Ac    s.av   #col 1 and 3 exist, so the new file will have the value of $oldhash{A}{s.av}
B    Bb    B     s.bv   #col0 and 3 exist, so I'll include the value of $oldhash{B}{s.bv}

Notice that cols 0 and 2 both exist in the hash, but this is not important since I only need one of the columns to exist.

The output can be exactly as the testing file with an added column that takes u or m from the other file. Example:

D    Db    Dc    s.av       #inserts empty column
E    A     Ab    d.ef       #inserts empty column
E    A     Ac    s.av   u   #with u inserted
B    Bb    B     s.bv   u   #with u inserted

This is where I got so far, but I'm getting a exists argument is not a HASH or ARRAY element or a subroutine at myfile.pl line 24:

#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $dir='D:\\';

open my $oldfile, "<", "$dir\\old file.txt";
open my $newfile, "<", "$dir\\new file.txt";

my (%oldhash);

# creates the hash of hashes
while (my $odata = <$oldfile>){
    chomp $odata;
    my @oline = split /\t/, lc $odata;
    push @{$oldhash{$oline[1]}{$oline[2]}}, $oline[3];
}

# does the comparison between the file and the hash
while (my $newlines = <$newfile>){
    chomp $newlines;
    my @line = split /\t/, $newlines;
    if (exists map {$oldhash{$_}} @line[0..2]) {
        print $oldhash{$_};
    }
}

close $updatedtax;
close $oldtax;

I appreciate all the help that you can give me! Thank you in advance

Was it helpful?

Solution

exists requires a single array or hash element as its parameter. You have passed it a list of scalar values whose origin has been lost once they have gone through map.

You could write your test as

if ( grep { exists $oldhash{$_} }, @line[0..2] ) { ... }

but I think there are better ways to write a solution.

I think this does what you want, but with the data you've given it outputs just u twice. You haven't shown a required output as I requested, so is that right?

I've inverted the keys that you chose for your own %oldhash so that a case can be rejected immediately just by checking for the existence of the fourth column (s.av etc.) in the hash.

I've also added use autodie, as it's essential to check whether an open has been successful before you go ahead and use data from the file handle, and this avoids checking every case explicity.

Finally I've added chdir 'D:\\' so that you don't have to prefix the file names with the path for every open.

The output includes the final "comment" column from new_file.txt that gave rise to it. I am sure you can alter the print statement to give the output that you desire.

use strict;
use warnings;
use autodie;

use Data::Dump;

chdir 'D:\\';

open my $old_fh, '<', 'old_file.txt';

my %old_data;
while (<$old_fh>) {
  chomp;
  my @fields = split /\t/;
  $old_data{$fields[1]}{$fields[0]} = $fields[2];
  print "@fields\n";
}
close $old_fh;

open my $new_fh, '<', 'new_file.txt';

while (<$new_fh>) {

  chomp;
  my @fields = split /\t/;

  my $new = '';
  if (my $list = $old_data{$fields[3]}) {
    my @possible = grep defined, @{$list}{@fields[0,1,2]};
    $new = $possible[0] if @possible;
  }

  print join("\t", @fields[0..3], $new, $fields[4]), "\n";
}

The contents of %old_data after reading the file look like this

(
  "s.av" => { A => "u", C => "u" },
  "s.bv" => { B => "u" },
  "s.cv" => { C => "m" },
)

output

D Db  Dc  s.av    #cols 0 - 2 do not exist in hash
E A Ab  d.ef    #column 1 exists, but column 3 doesn't, so nothing is done
E A Ac  s.av  u #col 1 and 3 exist, so the new file will have the value of $oldhash{A}{s.av}
B Bb  B s.bv  u #col0 and 3 exist, so I'll include the value of $oldhash{B}{s.bv}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top