sort -k1,1 -k2,2 -k3,3n -k4,4n file > temp
awk 'NR==1{print;next}
NR==2{start=$3;end=$4;id=$1 OFS $2;next}
{ if ($1 OFS $2 == id &&$3<=end)
{end=end>$4?end:$4;next}
print id,start,end;start=$3;end=$4;id=$1 OFS $2
}END{print id,start,end}' OFS="\t" temp
ID Name position_start position_end
ID01 P889 290 299
ID02 O991 355 373
ID02 O991 400 405
ID05 Q151 14 25
ID05 Q151 428 429
extracting data from a text file based on some conditions [closed]
Frage
I have a text file as follows
ID Name position_start position_end
ID01 P889 290 298
ID01 P889 290 299
ID02 O991 400 405
ID02 O991 355 373
ID02 O991 403 404
ID05 Q151 428 429
ID05 Q151 428 428
ID05 Q151 24 24
ID05 Q151 14 25
I would like to extract the longest starting and ending positions of each ID. My desired output is shown below.
ID Name position_start position_end
ID01 P889 290 299
ID02 O991 400 405
ID02 O991 355 373
ID05 Q151 428 429
ID05 Q151 14 25
Lösung
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow