extract lines based on a column

https://stackoverflow.com/questions/23569231

19-07-2023
|

Question

I have some files containing the following data.

 160-68 160 68 B-A OD-CA 3.80247
 160-68 160 68 B-A OG-C 3.73454
 160-69 160 69 B-A OD1-NZ 2.76641
 160-69 160 69 B-A OG-CA 3.54446
 160-69 160 69 B-A OE-NZ2 4.24609
 160-69 160 69 B-A OG-O 3.97644
 160-69 160 69 B-A OG-H 1.82292

I need to extract lines having any of the following pairs OD-NZ, NZ-OD, OE-NZ, NZ-OE, OE-NH, NH-OE in the 5th column. What is the easy way to do this?

Desired Output

160-69 160 69 B-A OD1-NZ 2.76641
160-69 160 69 B-A OE-NZ2 4.24609

Solution

awk '$5 ~ /OD[0-9]*-NZ[0-9]*|NZ[0-9]*-OD[0-9]*|OE[0-9]*-NZ[0-9]*|NZ[0-9]*-OE[0-9]*|OE[0-9]*-NH[0-9]*|NH[0-9]*-OE[0-9]*/' input.txt

OTHER TIPS

This might be easiest if you need to add/delete/modify any of the pairs you're interested in in future. With GNU awk for gensub():

awk -v pairs="OD-NZ NZ-OD OE-NZ NZ-OE OE-NH NH-OE" '
  BEGIN{ split(pairs, tmp); for (i in tmp) pairsArr[tmp[i]] }
  gensub(/[[:digit:]]/,"","g",$5) in pairsArr
' file
 160-69 160 69 B-A OD1-NZ 2.76641
 160-69 160 69 B-A OE-NZ2 4.24609

You can do the same with gsub() and a variable in any awk.

Or if you prefer:

awk -v pairs="OD-NZ|NZ-OD|OE-NZ|NZ-OE|OE-NH|NH-OE" '
BEGIN { pairs = "^" gensub(/([-|]|$)/,"[[:digit:]]*\\1","g",pairs) "$" }
$5 ~ pairs
' file
 160-69 160 69 B-A OD1-NZ 2.76641
 160-69 160 69 B-A OE-NZ2 4.24609

grep -P 'OD\d?-NZ\d?|NZ\d?-OD\d?|....' file

if you love using awk, get $5 and check with same regex.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow