문제

I would like to enable Biopython to read PQR files (modified PDB files with occupancy and B factor replaced by atom charge and radius).

The Biopython PDB parser fails to read the Bfactor because it retrieves the value by PDB column indices (which the PQR format does not honor).

Example of a standard PDB atom record:

ATOM      1  N   LEU     1       3.469  24.678   1.940  1.00 48.46           N

1.00 is occupancy and 48.46 is bfactor

And the PQR :

ATOM      1  N   LEU     1       3.469  24.678   1.940  0.1010 1.8240

0.1010 is charge and 1.8240 is radius

So, how can I avoid "PDBConstructionException: Invalid or missing B factor" and properly parse the charge/radius values?

도움이 되었습니까?

해결책

As the PQR format is no longer standard PDB format, you'd need to modify the source of the Biopython PDB parser to fit your needs. Thankfully, Biopython is open source, and PDB.PDBParser is quite readable/easy to modify.

Extracting data

From the PQR description you gave:

"This format can deviate wildly from PDB due to the use of whitespaces rather than specific column widths and alignments."

Biopython's PDB Parser expects values strictly on column widths. (It's perfectly valid for PDB files to have no white space between values.) I'd think your best bet would be to modify how line data is extracted in PDB.PDBParser, but maintain most of its other error-checking and Structure-creation. As the fields will be whitespace-delimited, you can simply use line.split() to create a list of parameters, which you then give meaningful names.

Once you parse the data from a given line, you'll probably want to store it as fields in an Atom object). Atoms are added to the structure with the structure_builder. Perhaps you could modify init_atom() to add charge and radius as fields to the PDB.Atom object.

Where to start

Here's the approximate location in the source code you'd want to modify.

Outline

So, start to finish, here's what I'd do:

  1. Create a new StructureBuilder method init_pqr_atom() (modelled after init_atom()) which creates a new Atom object, adding charge and radius as fields in a new Atom. (Perhaps you'd want to create a PDB.PQRAtom object that inherits PDB.Atom?).
  2. Create an optional parameter in the init() method of PDBParser that tells the parser it's a PQR file (not a standard PDB):

    def __init__(self, PERMISSIVE=True, get_header=False,
             structure_builder=None, QUIET=False, is_pqr=False):
    
  3. Pass is_pqr to _parse(), which passes it to _parse_coordinates.
  4. Within _parse_coordinates, parse data as normal if not a PQR file (i.e. use the default PDB column specifications). If it is PQR, parse the data based on the whitespace-delimited format (again, Python's str.split() will return a list of whitespace-delimited items from a string).
  5. Build the appropriate Atom or PQRAtom object in the structure, passing in the parsed values.
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top