HTML::TableExtract was created to extract information from HTML tables. Use it as follows:
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TableExtract;
my $file = 'input.html';
my $te = 'HTML::TableExtract'->new;
$te->parse_file($file);
my $t = $te->first_table_found;
my @columns;
my $first = 1;
for my $row ($t->rows) {
$first = 0, next if $first;
push @{ $columns[$_] }, $row->[$_] for 0 .. $#$row;
}
for my $column (@columns) {
print "$column->[0] = ", join(', ', @{ $column }[1 .. $#$column]), "\n";
}
Output:
PO Number = 0000000000, 0000000000
Invoice Number = 000000118608965, 000000122865088
DC Number = 0, 0
Store Number = 1860, 2286
Division = 1, 1
Invoice Amount = $-21.02, $-42.04