How to get the name of the input file in a Perl one-liner?

https://stackoverflow.com/questions/3948393

08-10-2019
|

Question

cat monday.csv

223.22;1256.4
227.08;1244.8
228.08;1244.7
229.13;1255.0
227.89;1243.2
224.77;1277.8

cat tuesday.csv

227.02;1266.3
227.09;1234.9
225.18;1244.7
224.13;1255.3
228.59;1263.2
224.70;1247.6

This Perl one-liner gives me the row with the highest value in the second column from the rows where in the first column the first 3 digits are 227 or 226 from the file "monday.csv":

$ perl -F\; -ane '$hash{$_} = $F[1] if /22[78]/; END{ print and exit for sort{ $hash{$b} <=> $hash{$a} } keys %hash }' monday.csv

This Perl one-liner gives me the row with the highest value in the second column from the rows where in the first column the first 3 digits are 227 or 226 from all *day.csv files :

$ perl -F\; -ane '$hash{$_} = $F[1] if /22[78]/; END{ print and exit for sort{ $hash{$b} <=> $hash{$a} } keys %hash }' *day.csv

How could I rewrite this one-liner to get an output:

filename : "row with the highest value in the second column from the rows where in the first column the first 3 digits are 227 or 226 from the file 'filename.csv'"

for each *day.csv file?

Solution

You can use $ARGV for the current file name. If you're only interested in the max, no need to store all the values and then sort them; instead, just store the max for each file. Also, your regex probably should be anchored to the start of the line.

# Line breaks added for display purposes.
perl -F\; -ane '
    $max{$ARGV} = $F[1] if /^22[78]/ and $F[1] > $max{$ARGV};
    END{ print "$_\t$max{$_}" for sort keys %max}
' *day.csv

Or, if you want to store the entire line where the max occurs:

perl -F\; -ane '
    ($max{$ARGV}{ln}, $max{$ARGV}{mx}) = ($_, $F[1])
        if /^22[78]/ and $F[1] > $max{$ARGV}{mx};
    END{ print "$_\t$max{$_}{ln}" for sort keys %max}
' *day.csv

OTHER TIPS

The filename is contained in the $ARGV variable:

$ARGV

contains the name of the current file when reading from <>.

However, the one-liners presented have an issue; what if you have repeated values of your first column?

A better one-liner would be:

$ perl -F/;/ -MList::Util=max -lane 'push @{ $wanted{$ARGV} }, $F[1] if $F[0] =~ /22[78]/; } END { print "$ARGV : ", max(@{ $wanted{$_} }) for keys %wanted;' *.csv

Based on the comment:

$ perl -F/;/ -lane '$wanted{$ARGV} = \@F if $F[1] >= $wanted->{$ARGV}[1] && $F[0] =~ /22[78]/; } END { print "$_ : @$wanted{$_}" for keys %wanted;' *.csv

Seems that you can use $ARGV. See "current filename"

If I would like the whole row, I could do this (based on FM's answer):

perl -F\; -ane '$max{$ARGV} = $_ if /^22[78]/ and $F[1] >= (split /;/, $max{$ARGV})[1];  END{ print "$_\t$max{$_}" for sort keys %max}' *day.csv

I found a shorter solution.
all files:

perl -F\; -anE '$max{$ARGV} = [@F] if /^22[78]/ and $F[1] >= $max{$ARGV}->[1];  END{ print "$_\t@{$max{$_}}" for sort keys %max}' *day.csv

one file:

perl -F\; -anE '$max = [@F] if /^22[78]/ and $F[1] >= $max->[1]; END{ print "@$max" }' monday.csv

or if there is not much space available

perl -F\; -anE'$m{$ARGV}=[@F]if/^22[78]/&&$F[1]>=$$m{$ARGV}[1]}print"$_\t@{$m{$_}}"for sort keys%m;{' *day.csv

perl -F\; -anE'$m=[@F]if/^22[78]/&&$F[1]>=$$m[1]}print"@$m";{' monday.csv

As Zaid revealed: to get the last row with the highest value in case of repeated highest values in a file I changed the "$F[1] > $max..."-part to "$F[1] >= $max".

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow