As the PQR format is no longer standard PDB format, you'd need to modify the source of the Biopython PDB parser to fit your needs. Thankfully, Biopython is open source, and PDB.PDBParser
is quite readable/easy to modify.
Extracting data
From the PQR description you gave:
"This format can deviate wildly from PDB due to the use of whitespaces rather than specific column widths and alignments."
Biopython's PDB Parser expects values strictly on column widths. (It's perfectly valid for PDB files to have no white space between values.) I'd think your best bet would be to modify how line data is extracted in PDB.PDBParser
, but maintain most of its other error-checking and Structure
-creation. As the fields will be whitespace-delimited, you can simply use line.split()
to create a list of parameters, which you then give meaningful names.
Once you parse the data from a given line, you'll probably want to store it as fields in an Atom object). Atoms are added to the structure with the structure_builder
. Perhaps you could modify init_atom()
to add charge and radius as fields to the PDB.Atom
object.
Where to start
Here's the approximate location in the source code you'd want to modify.
Outline
So, start to finish, here's what I'd do:
- Create a new
StructureBuilder
methodinit_pqr_atom()
(modelled afterinit_atom()
) which creates a new Atom object, addingcharge
andradius
as fields in a newAtom
. (Perhaps you'd want to create aPDB.PQRAtom
object that inheritsPDB.Atom
?). Create an optional parameter in the
init()
method ofPDBParser
that tells the parser it's a PQR file (not a standard PDB):def __init__(self, PERMISSIVE=True, get_header=False, structure_builder=None, QUIET=False, is_pqr=False):
- Pass
is_pqr
to_parse()
, which passes it to_parse_coordinates
. - Within
_parse_coordinates
, parse data as normal if not a PQR file (i.e. use the default PDB column specifications). If it is PQR, parse the data based on the whitespace-delimited format (again, Python'sstr.split()
will return a list of whitespace-delimited items from a string). - Build the appropriate
Atom
orPQRAtom
object in the structure, passing in the parsed values.