awk '$5 ~ /OD[0-9]*-NZ[0-9]*|NZ[0-9]*-OD[0-9]*|OE[0-9]*-NZ[0-9]*|NZ[0-9]*-OE[0-9]*|OE[0-9]*-NH[0-9]*|NH[0-9]*-OE[0-9]*/' input.txt
Question
I have some files containing the following data.
160-68 160 68 B-A OD-CA 3.80247
160-68 160 68 B-A OG-C 3.73454
160-69 160 69 B-A OD1-NZ 2.76641
160-69 160 69 B-A OG-CA 3.54446
160-69 160 69 B-A OE-NZ2 4.24609
160-69 160 69 B-A OG-O 3.97644
160-69 160 69 B-A OG-H 1.82292
I need to extract lines having any of the following pairs OD-NZ, NZ-OD, OE-NZ, NZ-OE, OE-NH, NH-OE in the 5th column. What is the easy way to do this?
Desired Output
160-69 160 69 B-A OD1-NZ 2.76641
160-69 160 69 B-A OE-NZ2 4.24609
Solution
OTHER TIPS
This might be easiest if you need to add/delete/modify any of the pairs you're interested in in future. With GNU awk for gensub():
awk -v pairs="OD-NZ NZ-OD OE-NZ NZ-OE OE-NH NH-OE" '
BEGIN{ split(pairs, tmp); for (i in tmp) pairsArr[tmp[i]] }
gensub(/[[:digit:]]/,"","g",$5) in pairsArr
' file
160-69 160 69 B-A OD1-NZ 2.76641
160-69 160 69 B-A OE-NZ2 4.24609
You can do the same with gsub() and a variable in any awk.
Or if you prefer:
awk -v pairs="OD-NZ|NZ-OD|OE-NZ|NZ-OE|OE-NH|NH-OE" '
BEGIN { pairs = "^" gensub(/([-|]|$)/,"[[:digit:]]*\\1","g",pairs) "$" }
$5 ~ pairs
' file
160-69 160 69 B-A OD1-NZ 2.76641
160-69 160 69 B-A OE-NZ2 4.24609
grep -P 'OD\d?-NZ\d?|NZ\d?-OD\d?|....' file
if you love using awk, get $5
and check with same regex.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow