Perl out of order diff between text files

Question 1

Perhaps the following will be helpful:

use strict;
use warnings;

my @files = @ARGV;
pop;
my %file1 = map { chomp; /(.+),/; $1 => $_ } <>;

push @ARGV, $files[1];
my %file2 = map { chomp; /(.+),/; $1 => $_ } <>;

print "$files[0]:\n";
print $file1{$_}, "\n" for grep !exists $file2{$_}, keys %file1;

print "\n$files[1]:\n";
print $file2{$_}, "\n" for grep !exists $file1{$_}, keys %file2;

Usage: perl script.pl file1.txt file2.txt

Output on your datasets:

file1.txt:
cat,val 1,43432

file2.txt:
cat,val 3,22
bird,output,9999

This builds a hash for each file. The keys are the first two columns and the associated values are the full lines. grep is used to filter the shared keys.

Edit: On relatively smaller files, using map as above to process the file's lines will work fine. However, a list of all of the file's lines is first created, and then passed to map. On larger files, it may be better to use a while (<>) { ... construct, to read one line at a time. The code below does this--generating the same output as above--and uses a hash of hashes (HoH). Because it uses a HoH, you'll note some dereferencing:

use strict;
use warnings;

my %hash;
my @files = @ARGV;

while (<>) {
    chomp;
    $hash{$ARGV}{$1} = $_ if /(.+),/;
}

print "$files[0]:\n";
print $hash{ $files[0] }{$_}, "\n"
  for grep !exists $hash{ $files[1] }{$_}, keys %{ $hash{ $files[0] } };

print "\n$files[1]:\n";
print $hash{ $files[1] }{$_}, "\n"
  for grep !exists $hash{ $files[0] }{$_}, keys %{ $hash{ $files[1] } };

Question 2

Use a Hash for this with first 2 columns as key. once you have these two hashes you can iterate and delete the common entries, what remains in respective hashes will be what you are looking for.

Initialize,

my %hash1 = ();
my %hash2 = ();

Read in first file, join first two columns to form key and save it in hash. This assumes fields are comma separated. You could use a CSV module also for the same.

open( my $fh1, "<", $file1 ) || die "Can't open $file1: $!";
while(my $line = <$fh1>) {
    chomp $line;

    # join first two columns for key
    my $key = join ",", (split ",", $line)[0,1];

    # create hash entry for file1
    $hash1{$key} = $line;
}

Do the same for file2 and create %hash2

open( my $fh2, "<", $file2 ) || die "Can't open $file2: $!";
while(my $line = <$fh2>) {
    chomp $line;

    # join first two columns for key
    my $key = join ",", (split ",", $line)[0,1];

    # create hash entry for file2
    $hash2{$key} = $line;
}

Now go over the entries and delete the common ones,

foreach my $key (keys %hash1) {
    if (exists $hash2{$key}) {
        # common entry, delete from both hashes
        delete $hash1{$key};
        delete $hash2{$key};
    }
}

%hash1 will now have lines which are only in file1.

You could print them as,

foreach my $key (keys %hash1) {
    print "$hash1{$key}\n";
}

foreach my $key (keys %hash2) {
    print "$hash2{$key}\n";
}

Question 3

I think the above prob can be solved by either of the mentioned algo

a) We can use the hash as mentioned above

b) 1. Sort both the files with Key1 and Key2 (use sort fun)

Iterate through FILE1

  Match the key1 and key2 entry of FILE1 with FILE2
      If yes then
        take action by printing common lines it to desired file as required
        Move to next row in File1 (continue with the loop )
      If No then
        Iterate through File2 startign from the POS-FILE2 until match is found
            Match the key1 and key2 entry of FILE1 with FILE2
            If yes then
              take action by printing common lines it to desired file as required
              setting FILE2-END as true
              exit from the loop noting the position of FILE2
            If no then
              take action by printing unmatched lines to desired file as req.
              Move to next row in File2
  If FILE2-END is true
     Rest of Lines in FILE1 doesnt exist in FILE2