문제

I have data (data.csv):

"1",5.1,"s"
"2",3.3,"s"
"3",2.7,"c"

and I want to count the number of line whose 3rd element is "s" or "c" with AWK (count.awk):

BEGIN{FS=","; s_count=0; c_count=0}
($3=="s"){s_count++}
($3=="c"){c_count++}
END{print s_count; print c_count}

then

$awk -f count.awk data.csv

but this does not work. Its output is:

0
0

this is not I expected. Why?

$ awk -V
GNU Awk 4.1.0, API: 1.0 (GNU MPFR 3.1.2, GNU MP 5.1.2)

Note: I use Awk on cygwin.

도움이 되었습니까?

해결책

The problem is that your target field has embedded double quotes, so you need to match them too, by including them - \-escaped - in the string to match against:

awk '
  BEGIN{FS=","; s_count=0; c_count=0}
  ($3=="\"s\"") {s_count++}
  ($3=="\"c\"") {c_count++}
  END{ print s_count; print c_count }
  ' data.csv

As an aside, you can simplify your awk program somewhat:

  • the parentheses are not needed (have not verified on cygwin, but given that it's awk interpreting the string, I wouldn't expect that to matter)
  • you don't strictly need to initialize your output variables, because awk defaults uninitialized variables to 0 in numerical contexts.
BEGIN{FS=","}
$3 == "\"s\"" {s_count++}
$3 == "\"c\"" {c_count++}
END{ print s_count; print c_count }

다른 팁

This is a job for an array. Here is an awk command:

awk -F, '{gsub(/\"/,"",$3);a[$3]++} END {for (i in a) print i,a[i]}' file
c 1
s 2

It counts the number of c and s occurrences. Also counts other letters if they exist.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top