Command with combined behavior of both `sort -n -u -k` and `uniq -c`

Question 1

Using decorate-sort-undecorate, you can append to the data the fields you want to base your processing in, do the processing, and remove the extra fields. E.g to sort on fields 17 and 5:

awk '{print $0 OFS $17 OFS $5}' test_s  | sort -n -k18 -k19  | uniq -c -f17 | awk '{NF=18;print}'

You first append the key fields, then sort and uniq on them, and then only preserve the count added by uniq and the original fields.

Question 2

As I currently understand it, you want to specify one or more columns to use as a key and obtain a result with each output line showing the multiplicity for that key. In that case, suppose your data is in a file called "data" and we want column 17 as the key:

$ awk '{print $17}' data  | sort -n | uniq -c
  4 234
  4 235
  3 236

Thus, the value of 236 appears in column 17 a total of 3 times in your test data. Or, suppose you wanted columns 6, 8, 1, and 3 as the key (and in that order):

$ awk '{print $6,$8,$1,$3}' data  | sort -n | uniq -c
 11 1116 532275 4549 22656489

For this key, all 11 lines are dups.

This approach has three steps. First, we have awk select the columns you want in the order you want. Second, sort -n sorts them numerically on the keys. Lastly, uniq counts duplicates.

UPDATE: Suppose, as above, we want to use columns 6, 8, 1, and 3 as the key but, as per your comment, we want keep one of the original lines. In this case we instruct awk to put the original 17 columns before the key, we tell sort to sort on the key (columns 18+) and then we instruct uniq to ignore those first 17 columns:

awk '{print $0,$6,$8,$1,$3}' data  | sort -k18 -n | uniq -f 17 -c

For your sample data, this results in:

     11        4549             10       22656489       63452166           3050           1116            621         532275        6010025         534075        6012488         477375        5995731         533175        6011257        8388615            234 1116 532275 4549 22656489

If you only want the original 17 columns printed, then we can use perl to show just the first 17 columns and crop off the key:

awk '{print $0,$6,$8,$1,$3}' data  | sort -k18 -n | uniq -f 17 -c | perl -nle '@a=split;print join " ", @a[0..17]'

which results in:

11 4549 10 22656489 63452166 3050 1116 621 532275 6010025 534075 6012488 477375 5995731 533175 6011257 8388615 234