Question

Still a beginner here.

How do I count the unique values of a unique set of other values in a table using bash awk sed etc? So in a two (or more) column table such as this:

AA a
AA b
AA c
AA a
AB b
AA a
AB c
AA a
AC b
AC b
XX z

how would I get an output of

AA 3
AB 2
AC 1
XX 1

Where the second column of the output is the count of unique values of all the possible first column values;

So , there are 6 AAs but only 3 unique AA values etc.

I don't want to use SQL. just bash command line programs such as awk... any help is much appreciated :)

Was it helpful?

Solution

This will do the trick:

$ awk '!a[$0]++{u[$1]++}END{for (k in u) print k,u[k]}' file
AA 3
AB 2
AC 1
XX 1

To ensure sorted output pipe to sort -rnk2 for a reverse numerical sort on the second field:

$ awk '!a[$0]++{u[$1]++}END{for (k in u) print k,u[k]}' file | sort -rnk2
AA 3
AB 2
XX 1
AC 1

Explanation:

We keep a count of all the unique lines in the associative array a and only update the secondary array u if we haven't seen the current line before.

OTHER TIPS

I was thinking if it is possible in other ways, here is a non awk solution:

sort file | uniq | cut -f1 -d' ' | uniq -c | rev
 sort -u file | uniq -cw2 | awk '{print $2,$1}'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top