Вопрос

I am trying to extract a subset of my data which is tab delimited. I would like to use some information in a column. For example column2 has three scores seperated by ";"

col1 col2
1    a=2;b=1.1;c=0    
1    a=0.2;b=0.2;c=0.5  
1    a=1.5;b=1.9;c=3.5  

I would like to extract the rows whose b value is grater than 1. In this case my desired output will be

col1 col2
1    a=2;b=1.1;c=0    
1    a=1.5;b=1.9;c=3.5  

I tried to use awk but extracting information within the column did not work. Also, the order is not always the same (a,b,c etc.)It would be best to include 'b > 1' in the search criteria. Any suggestions?

Это было полезно?

Решение

Since the order of Column2 can be random, you can do something like:

awk -F'\t' '
NR>1 {
    split($2,ary,/[;=]/); 
        for (i=1;i<=length(ary);i++) { 
            if (ary[i]=="b" && ary[i+1]>1) {
                print $0 
            }
        }
    next
}1' file

Test:

$ cat f
col1    col2
1       a=2;b=1.1;c=0    
1       a=0.2;b=0.2;c=0.5  
1       a=1.5;b=1.9;c=3.5  

$ awk -F'\t' '
NR>1 {
    split($2,ary,/[;=]/); 
        for (i=1;i<=length(ary);i++) { 
            if (ary[i]=="b" && ary[i+1]>1) {
                print $0 
            }
        }
    next
}1' f
col1    col2
1       a=2;b=1.1;c=0    
1       a=1.5;b=1.9;c=3.5  

Другие советы

GNU sed

sed -r '/b=0.[0-9]?|b=1.0|b=1([^0-9.]|$)/d' file
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top