How to parse PQR files with Biopython

Question

As the PQR format is no longer standard PDB format, you'd need to modify the source of the Biopython PDB parser to fit your needs. Thankfully, Biopython is open source, and PDB.PDBParser is quite readable/easy to modify.

Extracting data

From the PQR description you gave:

"This format can deviate wildly from PDB due to the use of whitespaces rather than specific column widths and alignments."

Biopython's PDB Parser expects values strictly on column widths. (It's perfectly valid for PDB files to have no white space between values.) I'd think your best bet would be to modify how line data is extracted in PDB.PDBParser, but maintain most of its other error-checking and Structure-creation. As the fields will be whitespace-delimited, you can simply use line.split() to create a list of parameters, which you then give meaningful names.

Once you parse the data from a given line, you'll probably want to store it as fields in an Atom object). Atoms are added to the structure with the structure_builder. Perhaps you could modify init_atom() to add charge and radius as fields to the PDB.Atom object.

Where to start

Here's the approximate location in the source code you'd want to modify.

Outline

So, start to finish, here's what I'd do:

Create a new StructureBuilder method init_pqr_atom() (modelled after init_atom()) which creates a new Atom object, adding charge and radius as fields in a new Atom. (Perhaps you'd want to create a PDB.PQRAtom object that inherits PDB.Atom?).
Create an optional parameter in the init() method of PDBParser that tells the parser it's a PQR file (not a standard PDB):
```
def __init__(self, PERMISSIVE=True, get_header=False,
         structure_builder=None, QUIET=False, is_pqr=False):
```
Pass is_pqr to _parse(), which passes it to _parse_coordinates.
Within _parse_coordinates, parse data as normal if not a PQR file (i.e. use the default PDB column specifications). If it is PQR, parse the data based on the whitespace-delimited format (again, Python's str.split() will return a list of whitespace-delimited items from a string).
Build the appropriate Atom or PQRAtom object in the structure, passing in the parsed values.