Your question isn't quite clear so I just reverse engineered your output to your input (assuming there is a typo in your output since you mention to count number of 1's, 2's and 3's from column 2 and show 2 R,2
). You'll probably need to explain your question a little better -
sort -t "," -k2 < p1.csv |
awk -F, '!z[$2]++{ a[$2]=$0; } END {for (i in a) print z[i], a[i]}' |
sort -k1
Explaination:
- !z[$2]++ removes the duplicates based on column 2 as awk progresses thru
each line.
- a[$2]=$0 stores the non-duplicates lines in an array
- END {..} looks at all the keys in array and pulls up values. For array a
it pulls up the first line it sees with unique column 2 (as your desired
output). For array z it pulls up number of lines seen with same column 2.
Test:
[jaypal:~/temp] cat file
R,3
R,4
S,1
S,2
S,3
R,2
T,4
R,3
ST,4
RST,2
RSTR,4
[jaypal:~/temp] sort -t "," -k2 < t |
awk -F, '!z[$2]++{ a[$2]=$0; } END {for (i in a) print z[i], a[i]}' |
sort -k1
1 S,1
3 R,2
3 R,3
4 R,4
Sort with -u option
For finding unique entries based on column, you can try sort with -u option (but it won't give you the counts though).
From the man
page:
-u, --unique
with -c, check for strict ordering;
without -c, output only the first of an equal run
You can try something like this -
sort -t, -k2 p1.csv | sort -u -t, -k2
Using Uniq
I am not sure Uniq can be performed on a column separated by a delimiter other than a blank. Atleast on my mac it doesn't. Here is the man page reference
-f num Ignore the first num fields in each input line when doing comparisons.
A field is a string of non-blank characters separated
from adjacent fields by blanks. Field numbers are one based,
i.e., the first field is field one.
So if you can remove the ,
delimiter and run the following command you should get your desired result.
sort -k2 test | uniq -c -f1
Test:
[jaypal:~/temp] cat test
R 3
R 4
S 1
S 2
S 3
R 2
T 4
R 3
ST 4
RST 2
RSTR 4
[jaypal:~/temp] sort -k2 test | uniq -c -f1
1 S 1
3 R 2
3 R 3
4 R 4