As I currently understand it, you want to specify one or more columns to use as a key and obtain a result with each output line showing the multiplicity for that key. In that case, suppose your data is in a file called "data" and we want column 17 as the key:
$ awk '{print $17}' data | sort -n | uniq -c
4 234
4 235
3 236
Thus, the value of 236 appears in column 17 a total of 3 times in your test data. Or, suppose you wanted columns 6, 8, 1, and 3 as the key (and in that order):
$ awk '{print $6,$8,$1,$3}' data | sort -n | uniq -c
11 1116 532275 4549 22656489
For this key, all 11 lines are dups.
This approach has three steps. First, we have awk
select the columns you want in the order you want. Second, sort -n
sorts them numerically on the keys. Lastly, uniq
counts duplicates.
UPDATE: Suppose, as above, we want to use columns 6, 8, 1, and 3 as the key but, as per your comment, we want keep one of the original lines. In this case we instruct awk to put the original 17 columns before the key, we tell sort to sort on the key (columns 18+) and then we instruct uniq to ignore those first 17 columns:
awk '{print $0,$6,$8,$1,$3}' data | sort -k18 -n | uniq -f 17 -c
For your sample data, this results in:
11 4549 10 22656489 63452166 3050 1116 621 532275 6010025 534075 6012488 477375 5995731 533175 6011257 8388615 234 1116 532275 4549 22656489
If you only want the original 17 columns printed, then we can use perl to show just the first 17 columns and crop off the key:
awk '{print $0,$6,$8,$1,$3}' data | sort -k18 -n | uniq -f 17 -c | perl -nle '@a=split;print join " ", @a[0..17]'
which results in:
11 4549 10 22656489 63452166 3050 1116 621 532275 6010025 534075 6012488 477375 5995731 533175 6011257 8388615 234