Output whole line once for each uniqe value of a column accoring to the value of a second column

StackOverflow https://stackoverflow.com/questions/22215619

  •  10-06-2023
  •  | 
  •  

Question

My question is very similar to this previously asked question:

Output whole line once for each unique value of a column (Bash)

but with one major difference. In his example:

pep> AEYTCVAETK     2   genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK        1   genes ADUm.1999,ADUm.3560
pep> AIQLTGK        8   genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR  5   genes ADUm.367
pep> VSSILEDKTT     9   genes ADUm.1192,ADUm.2731
pep> AIQLTGK        10  genes ADUm.1999,ADUm.3560
pep> VSSILEDKILSR   3   genes ADUm.2146,ADUm.5750
pep> VSSILEDKILSR   2   genes ADUm.2146,ADUm.5750

The goal was to "print a line for each distinct value of the peptides in column 2, meaning the above input would become:"

pep> AEYTCVAETK     2   genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK        1   genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR  5   genes ADUm.367
pep> VSSILEDKTT     9   genes ADUm.1192,ADUm.2731
pep> VSSILEDKILSR   3   genes ADUm.2146,ADUm.5750

But what I would like to do is to print one line for each unique entry in column 2, however I would like to print the line with the highest value in column 3, so the output would look like this:

pep> AEYTCVAETK     2   genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK        10  genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR  5   genes ADUm.367
pep> VSSILEDKTT     9   genes ADUm.1192,ADUm.2731
pep> VSSILEDKILSR   3   genes ADUm.2146,ADUm.5750

Thanks in advance.

Was it helpful?

Solution

Here is one way of doing it:

awk '
($2 in seen) {
    line[$2] = ($3 > seen[$2]) ? $0 : line[$2];
    next
}
{
    seen[$2] = $3;
    line[$2] = $0
}
END {
    for(x in line) print line[x]
}' file

Output:

pep> AIQLTGK        10  genes ADUm.1999,ADUm.3560
pep> AEYTCVAETK     2   genes ADUm.1024,ADUm.5198,ADUm.750
pep> VSSILEDKILSR   3   genes ADUm.2146,ADUm.5750
pep> VSSILEDKTT     9   genes ADUm.1192,ADUm.2731
pep> KHEPPTEVDIEGR  5   genes ADUm.367
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top