extracting data from a text file based on some conditions [closed]

https://stackoverflow.com/questions/21200329

awk
gawk

29-09-2022
|

Frage

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.

Closed 8 years ago.

Improve this question

I have a text file as follows

ID     Name position_start position_end
ID01    P889      290       298
ID01    P889      290       299
ID02    O991      400       405
ID02    O991      355       373
ID02    O991      403       404
ID05    Q151      428       429
ID05    Q151      428       428
ID05    Q151      24        24
ID05    Q151      14        25

I would like to extract the longest starting and ending positions of each ID. My desired output is shown below.

    ID      Name  position_start position_end
    ID01    P889      290       299
    ID02    O991      400       405
    ID02    O991      355       373
    ID05    Q151      428       429
    ID05    Q151      14        25

Lösung

sort -k1,1 -k2,2 -k3,3n -k4,4n file > temp

awk 'NR==1{print;next}
NR==2{start=$3;end=$4;id=$1 OFS $2;next}
{ if ($1 OFS $2 == id &&$3<=end) 
      {end=end>$4?end:$4;next}
   print id,start,end;start=$3;end=$4;id=$1 OFS $2
}END{print id,start,end}' OFS="\t" temp

ID     Name position_start position_end
ID01    P889    290     299
ID02    O991    355     373
ID02    O991    400     405
ID05    Q151    14      25
ID05    Q151    428     429

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow