Question

I have this kind of input:

rs10000004 C T 4 rs10000004 0 75625312 C C C C T 0 C T 
rs10000005 G A 4 rs10000005 0 75625355 G 0 A A A G A A 

I want to substitute columns from 8 to end by "A" if the value in the column is identical to the 2nd field $2 or "B" if the value is identical to the third field $3. Else, the value is printed as it is (Zero values are expected in some columns)

Expected output

rs10000004 C T 4 rs10000004 0 75625312 A A A A B 0 A B 
rs10000005 G A 4 rs10000005 0 75625355 A 0 B B B A B B 

I tried the following but it doesn't give me any results just empty lines. Improving my code is better for me than to show me a new solution using something other than awk

cat input | awk '{ for(i=8; i<=NF; i++) { if($i == $2) $i="A"; else if($i == $3) $i="B"; else $i == 0; } print $i }'

Thanks in advance

Was it helpful?

Solution

Code :

awk '
{
    for (i=8; i<=NF; i++) {
       if ($i == $2) {
           $i = "A";
       }
       else {
           if ($i == $3) {
               $i = "B";
           }
           else {
               $i = 0;
           }
       }
    }
    print;        
}' input

Or shorter :

awk '
{
    for (i=8; i<=NF; i++) {
       if ($i == $2)
           $i="A";
       else
           if ($i == $3)
               $i="B";
           else
               $i = 0;
    }
}
1' input

Output :

rs10000004 C T 4 rs10000004 0 75625312 A A A A B 0 A B 
rs10000005 G A 4 rs10000005 0 75625355 A 0 B B B A B B 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top