문제

I have a number of CSV files, all with the same structure, in one directory. Now being said that, i would like to parse the lines and count how many lines have column 5 in a predefined values array a = [A, B, C, D].

Im pretty unexperienced with shell scripting so, is awk the way to do this, or should i go for a python script?

도움이 되었습니까?

해결책

Use this awk one liner:

awk '$5 ~ /^(A|B|C|D)$/' *.csv

It iterates over all lines of all .csv files in the current directory and checks if the 5th column ($5) matches (~) the pattern ^(A|B|C|D)$. If the line matches the pattern awk will print the whole line. We have not specified an action but printing the whole line is awk's default action.

The pattern:

^ matches the begin of the string and $ matches the end of the string. Therefore the pattern matches from beginning to the end. (A|B|C|D) represents a choise of possible values, like a logical OR operation in other programming languages. I've used the single characters A,B,C,D as in your question but you are free to use something like (foo|bar|hello|world).

다른 팁

The other currently posted answer does a RE comparison which is almost certainly not what you really are looking for (try it if one of your desired values is .*).

This does a string comparison:

awk '
BEGIN{ split("A B C D",tmp); for (i in tmp) a[tmp[i]] }
$5 in a { cnt++ }
END { print cnt+0 }
' file
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top