Question

I have two data files that have codes associated with named objects. The first file is a very long reference list containing all named objects and their associated code A and code B. The second file is a shorter list of items that are useful to me. I only have code A with the named items in this file. I would like to create a new file that has both codes for the objects of my interest.

datafile1:

12  apple   18
62  orange  26
114 banana  8

datafile2:

12  apple
62  orange
114 banana

I am a new programmer. The following is what I have tried. I get no matches and an empty output file.

open( INF, 'data.txt' );
my @array = <INF>;
my %codes;
our $Sep = "\t";

foreach my $line (@array) {
    chomp $line;
    my @temp = split /\t/, $line;
    my ($A_ID) = ( split( $Sep, $line ) )[0];
    push @{ $codes{$A_ID} }, $line;
}
close INF;

my %match;
open( IN, 'data2.txt' );
my @array_next = <IN>;
foreach my $line (@array_next) {
    chomp $line;
    ( my $bA_ID, my $name ) = split /\t/, $line;
    foreach my $codes_line ( @{ $codes{$A_ID} } ) {
        my ( $Name, $B_ID ) = ( split( $Sep, $line ) )[ 1, 2 ];
        my $new_array;
        if ( $bA_ID eq $A_ID ) {
            $new_array = $bA_ID . "\t" . $B_ID . "\t" . $Name;
            $match{new_array}++;
        }
    }
}

close IN;
print "Number of matched : " . keys(%match) . "\n";
open( OUT, ">Code_Match.txt" );
print OUT "$key\n";
Was it helpful?

Solution

The following is a clean up of your code. There are cleaner ways of doing this. Once you start using use strict and use warnings many, if not all, of your issues will disappear.

use strict;
use warnings;

open (my $INF, '<', 'data.txt');
my %codes;
my $Sep = qr/\t/;

while( my $line = <$INF>){
  chomp $line;
  my $A_ID = (split($Sep, $line ))[0]; 
  push @{$codes{$A_ID}},$line ; 
}
close $INF;

my %match;
open (my $IN, '<', 'data2.txt');
while( my $line = <$IN>){
  chomp $line;
  my ($bA_ID, $name) = split $Sep, $line;
  if (exists($codes{$bA_ID})) {
    foreach my $codes_line (@{$codes{$bA_ID}}) {
      my ($Name,$B_ID) = (split($Sep, $codes_line))[1,2];
      my $new_array = join("\t", $bA_ID, $B_ID, $Name);
      $match{$new_array}++;
    }
  }
}

close $IN;
print "Number of matched : ".keys(%match)."\n";
open (my $OUT,">", "Code_Match.txt");
print $OUT "$_\n" foreach keys(%match);
close $OUT;

OTHER TIPS

@imran beat me to it, but here was my attempt at cleaning up the code. As he mentioned, use strict and warnings, but I also recommend a cleaner naming of variables.

For what it's worth, here's my cleaned-up code:

#!perl

use strict;
use warnings;

open (IN, 'data.txt') or die $!;
my %codes;
my $Sep = "\t";
my $A_ID, my $B_ID;

while (<IN>){
  chomp;
  my $codeA1 = (split(/$Sep/, $_))[0];
  push(@{$codes{$codeA1}}, $_);
}
close(IN);
my %match;
open (IN, 'data2.txt') or die $!;
while(<IN>){
  chomp;
  my @fields = split(/$Sep/, $_);
  my $codeA2 = $fields[0];
  my $name2 = $fields[1];
  if (exists $codes{$codeA2}) {
    foreach (@{$codes{$codeA2}}) {
      $match{$_}++;
    }
  }
}
close(IN);

print "Number of matched : ". scalar(keys(%match)) ."\n";
open (OUT,">Code_Match.txt");
foreach (keys %match) {
  print OUT "$_\n";  
}
close(OUT);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top