Whenever you hear "I need to look up something", think hashes.
What you can do is create a hash that contains the elements you want to pull out of file #1. Then, use a second hash to track whether or not you printed it before:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw(say);
use autodie; # This way, I don't have to check my open for failures
use constant {
TABLE_FILE => "file1.txt",
LOOKUP_FILE => "file2.txt",
};
open my $lookup_fh, "<", LOOKUP_FILE;
my %lookup_table;
while ( my $symbol = <$lookup_fh> ) {
chomp $symbol,
$lookup_table{$symbol} = 1;
}
close $lookup_fh;
open my $table_file, "<", TABLE_FILE;
my %is_printed;
while ( my $line = <$table_file> ) {
chomp $line;
my @line_array = split /\s+/, $line;
my $symbol = $line_array[1];
if ( exists $lookup_table{$symbol} and not exists $is_printed{$symbol} ) {
say $line;
$is_printed{$symbol} = 1;
}
}
Two loops, but much more efficient. In yours, if you had 100 items in the first file, and 1000 items in the second file, you would have to loop 100 * 1000 times or 1,000,000. In this, you only loop the total number of lines in both files.
I use the three-parameter method of the open
command which allows you to handle files with names that start with |
or <
, etc. Also, I use variables for my file handles which make it easier to pass the file handle to a subroutine if so desired.
I use use autodie;
which handles issues such as what if my file doesn't open. In your program, the program would continue on its merry way. If you don't want to use autodie
, you need to do this:
open $fh, "<", $my_file or die qq(Couldn't open "$my_file" for reading);
I use two hashes. The first is %lookup_table
which stores the Symbols you want to print. When I go through the first file, I can simply check if `$lookup_table{$symbol} exists. If it doesn't, I don't print it, if it does, I print it.
The second hash %is_printed
keeps track of Symbols I've already printed. If $is_printed{$symbol}
exists, I know I've already printed that line.
Even though you said the second table is tab separated, I use /\s+/
as the split regular expression. This will catch a tab, but it will also catch if someone used two tabs (to keep things looking nice) or accidentally typed a space before that tab.