Question

I have a list of residue numbers saved in centerResidueList = [100, 140, 170, 53] and I am trying to get all the neighboring residues from this set of residues.
Currently I am using the script below, were I process the whole PDB file and generate a list of atom pairs with a distances of 10.0 and then iterate through the list and check if the residues number in the all_neighbors list correspond to a residue number in centerResidueList.

from Bio.PDB import *

centerResidueList = [100, 140, 170, 53]
neighbours_resi_number = []
structure = PDBParser().get_structure('X', "1xxx.pdb") 
atom_list = Selection.unfold_entities(structure, 'A') 
ns = NeighborSearch(atom_list)
all_neighbors = ns.search_all(10.0, "R") 
for residuepair in all_neighbors:
    resi_number = residuepair[0].id[1]
    if resi_number in centerResidueList:
        resi_number_partner = residuepair[1].id[1]
        neighbours_resi_number.append(resi_number_partner)

First, how can I just create the atom_list using CA atoms only?

Second, is residuepair[0].id[1] the correct way of generating the residue numbers (it works but is there a method to get this)?

Finally, are there any better solutions to achieve this?

Was it helpful?

Solution

Using NeighborSearch is definitely the right idea - it constructs a k-d tree, which performs very fast lookups on nearest neighbors.

If you have just a few residues to search around, I would use the search() method on those residues' atoms (perhaps just their CA atoms for speed). This will be more efficient than using search_all() then filtering. I'll answer your two questions, then provide a complete solution at the bottom.


How can I just create the atom_list using CA atoms only?

You can either use filter, or a list comprehension (I think list comprehensions are more readable):

atom_list = [atom for atom in structure.get_atoms() if atom.name == 'CA']

Second, is residuepair[0].id[1] the correct way of generating the residue numbers (it works but is there a method to get this)?

This is definitely the right approach. However (and this is a significant caveat), note that this will not handle residues with insertion codes. Why not deal with the Residue objects themselves?


My code:

from Bio.PDB import NeighborSearch, PDBParser, Selection


structure = PDBParser().get_structure('X', "1xxx.pdb")

chain = structure[0]['A']  # Supply chain name for "center residues"
center_residues = [chain[resi] for resi in [100, 140, 170, 53]]
center_atoms = Selection.unfold_entities(center_residues, 'A')

atom_list = [atom for atom in structure.get_atoms() if atom.name == 'CA']
ns = NeighborSearch(atom_list)

# Set comprehension (Python 2.7+, use `set()` on a generator/list for < 2.7)
nearby_residues = {res for center_atom in center_atoms
                   for res in ns.search(center_atom.coord, 10, 'R')}

# Print just the residue number (WARNING: does not account for icodes)
print sorted(res.id[1] for res in nearby_residues)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top