Question

I have this pdb file and I want to calculate the distance between the atom 7 and 8 ($2) with the atoms 12,14,15,17 and 18. If the distance is lower than 5 angstrons, the value should be printed

ATOM      1  N   ASN p 140      38.455  18.232  -3.207  1.00  7.39           N  
ATOM      2  CA  ASN p 140      37.856  18.151  -4.534  1.00  7.91           C  
ATOM      3  C   ASN p 140      38.700  18.848  -5.595  1.00 10.75           C  
ATOM      4  O   ASN p 140      39.797  19.271  -5.313  1.00  9.25           O  
ATOM      5  CB  ASN p 140      36.435  18.715  -4.446  1.00  7.62           C  
ATOM      6  CG  ASN p 140      35.556  17.898  -3.501  1.00  6.82           C  
ATOM      7  OD1 ASN p 140      35.269  18.315  -2.323  1.00  8.53           O  
ATOM      8  ND2 ASN p 140      35.197  16.691  -3.945  1.00  5.41           N  
TER       9      ASN   140 
HETATM   10  C   08H p   1      29.121  15.727  -1.182  1.00  5.89           C  
HETATM   11  C   08H p   1      29.763  16.230  -0.040  1.00  5.86           C  
HETATM   12  N   08H p   1      31.023  16.810  -0.046  1.00  6.15           N  
HETATM   13  C   08H p   1      31.533  17.872   0.633  1.00  6.24           C  
HETATM   14  N   08H p   1      32.815  18.037   0.299  1.00  6.83           N  
HETATM   15  N   08H p   1      33.151  17.112  -0.526  1.00  7.37           C  
HETATM   16  C   08H p   1      32.058  16.349  -0.758  1.00  7.06           C  
HETATM   17  O   08H p   1      31.956  15.215  -1.730  1.00  8.15           O  
HETATM   18  N   08H p   1      30.979  15.691  -2.746  1.00 10.31           N  
HETATM   19  C   08H p   1      29.651  15.777  -2.509  1.00  6.71           C  
HETATM   20  O   HOH p 170      34.699  19.032   2.134  1.00  6.42           O

Based on a similar script, I wrote this code

# usage: awk -f test.awk structure.pdb
BEGIN{print "asparagine and ligand in the structure..."; ORS=""}
$1=="ATOM" && $3~"ND2|OD1" && $4=="ASN" || $1=="HETATM" && $12~"N|O" && $4!~"HOH" {
print $2,$3,$4,$6"\n"
atm_x[$2]=$7; atm_y[$2]=$8; atm_z[$2]=$9
}

END{ ORS="\n"
for (key1 in atm_x) { list=list" "key1
for (key2 in atm_x) {
if (index(list, key2) != 0 ) continue
dx=atm_x[key1]-atm_x[key2]
dy=atm_y[key1]-atm_y[key2]
dz=atm_z[key1]-atm_z[key2]
distance=sqrt(dx^2+dy^2+dz^2)
if (distance < 5 && distance != 0  ) {
i++
candidate[i]=key1"-"key2": "distance
}
}
}
print "\nCandidates ..."
for (keys in candidate) {print candidate[keys]}
}

when I run this script I get the following result

asparagine and ligand in the structure...
7 OD1 ASN 140
8 ND2 ASN 140
12 N 08H 1
14 N 08H 1
17 O 08H 1
18 N 08H 1

Candidates ...
7-8: 2.2964
7-14: 3.60198
7-17: 4.57576
8-17: 4.19391
8-18: 4.49768
12-14: 2.19905
12-17: 2.50007
12-18: 2.92303
14-17: 3.58028
14-18: 4.25989
17-18: 1.48774

The problem is that I don't want to print the distances when the atoms have the same residue name ($4). I'm new to awk and was wondering what's the best way to handle this. Any suggestions would be appreciated!!

Was it helpful?

Solution

awk '
    ($1=="ATOM" && ($3=="ND2" || $3=="OD1") && $4=="ASN") || \
    ($1=="HETATM" && ($12=="N" || $12 =="O") && $4!="HOH") {
        atom[$2] = 1
        x[$2] = $7
        y[$2] = $8
        z[$2] = $9
        name[$2] = $4
    }
    END {
        for (a in atom) {
            for (b in atom) {
                if (a > b && name[a] != name[b]) {
                    dist = sqrt((x[a]-x[b])^2 + (y[a]-y[b])^2 + (z[a]-z[b])^2)
                    if (dist < 5)
                        printf "%s-%s: %.4f\n", a, b, dist
                }
            }
        }
    }
' pdbfile
7-17: 4.5758
7-14: 3.6020
8-17: 4.1939
8-18: 4.4977
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top