Look at your output. It's clearly produced by
- for each row of the second file,
- for each row of the first file with the same id,
- print out the combined rows
- for each row of the first file with the same id,
So the question is: How does you find the rows of the first file with the same id as a row of the second file?
The answer is: You store the rows of the first file in a hash indexed by the row's id.
my %file1;
while (<$file1_fh>) {
my ($id, $rest) = /^(\S++)(.*)/s;
push @{ $file1{$id} }, $rest;
}
So the earlier pseudo code resolves to
while (my $row2 = <$file2_fh>) {
chomp($row2);
my ($id) = $row2 =~ /^(\S+)/;
for my $rest (@{ $file1{$id} }) {
print("$row2$rest");
}
}
#!/usr/bin/env perl
use strict;
use warnings;
open(my $GOTERMS, $ARGV[0])
or die("Error opening GO terms file \"$ARGV[0]\": $!\n");
open(my $SNPS, $ARGV[1])
or die("Error opening SNP file \"$ARGV[1]\": $!\n");
my %goterm;
while (<$GOTERMS>) {
my ($id, $rest) = /^(\S++)(.*)/s;
push @{ $goterm{$id} }, $rest;
}
while (my $row2 = <$SNPS>) {
chomp($row2);
my ($id) = $row2 =~ /^(\S+)/;
for my $rest (@{ $goterm{$id} }) {
print("$row2$rest");
}
}