Question

I am using suds package to query a API from a website, the data returned from their website looks like this,:

(1). Can anyone tell me what kind of format is this?

(2). If so, what will be the easiest way to parse the data looks like this? I have dealt quite a lot with HTML/XML format using BeautifulSoup but before I lift my finger to write regular expressions for this type of format. I am curious is this some type of 'popular format' and there are actually some beautiful parser already written. Thanks.

    # Below are the header and tail of the response.. 
    (DetailResult)
{ 

status = (Status){ message = None code = "0" } 

searchArgument = (DetailSearchArgument){ reqPartNumber = "BQ" reqMfg = "T" reqCpn = None } 

detailsDto[] = (DetailsDto){ 

summaryDto = (SummaryDto){ PartNumber = "BQ" seMfg = "T" description = "Fast" } 

packageDto[] = 
(PackageDto){ fetName = "a" fetValue = "b" }, 
(PackageDto){ fetName = "c" fetValue = "d" }, 
(PackageDto){ fetName = "d" fetValue = "z" }, 
(PackageDto){ fetName = "f" fetValue = "Sq" }, 
(PackageDto){ fetName = "g" fetValue = "p" }, 

additionalDetailsDto = (AdditionalDetailsDto){ cr = None pOptions = None inv = None pcns = None } 

partImageDto = None 

riskDto = (RiskDto){ life= "Low" lStage = "Mature" yteol = "10" Date = "2023"} 

partOptionsDto[] = (ReplacementDto){ partNumber = "BQ2" manufacturer = "T" type = "Reel" }, 

inventoryDto[] = 
(InventoryDto){ distributor = "V" quantity = "88" buyNowLink = "https://www..." }, 
(InventoryDto){ distributor = "R" quantity = "7" buyNowLink = "http://www.r." }, 
(InventoryDto){ distributor = "RS" quantity = "2" buyNowLink = "http://www.rs.." }, 
}, 

}
Was it helpful?

Solution

This looks like some kind of nested repr output, similar to JSON but with structure or object name information ("a Status contains a message and a code"). If it's nested, regexes alone won't do the job. Here is a rough pass at a pyparsing parser

sample = """
... given sample text ...
 """
from pyparsing import *

# punctuation
LPAR,RPAR,LBRACE,RBRACE,LBRACK,RBRACK,COMMA,EQ = map(Suppress,"(){}[],=")

identifier = Word(alphas,alphanums+"_")

# define some types that can get converted to Python types
# (parse actions will do conversion at parse time)
NONE = Keyword("None").setParseAction(replaceWith(None))
integer = Word(nums).setParseAction(lambda t:int(t[0]))
quotedString.setParseAction(removeQuotes)

# define a placeholder for a nested object definition (since objDefn
# will be referenced within its own definition)
objDefn = Forward()

objType = Combine(LPAR + identifier + RPAR)

objval = quotedString | NONE | integer | Group(objDefn)
objattr = Group(identifier + EQ + objval)

arrayattr = Group(identifier + LBRACK + RBRACK + EQ + Group(OneOrMore(Group(objDefn)+COMMA)) )

# use '<<' operator to assign content to previously declared Forward
objDefn << objType + LBRACE + ZeroOrMore((arrayattr | objattr) + Optional(COMMA)) + RBRACE
# parse sample text
result = objDefn.parseString(sample)

# use pprint to list out indented parsed data
import pprint
pprint.pprint(result.asList())

Prints:

['DetailResult',
 ['status', ['Status', ['message', None], ['code', '0']]],
 ['searchArgument',
  ['DetailSearchArgument',
   ['reqPartNumber', 'BQ'],
   ['reqMfg', 'T'],
   ['reqCpn', None]]],
 ['detailsDto',
  [['DetailsDto',
    ['summaryDto',
     ['SummaryDto',
      ['PartNumber', 'BQ'],
      ['seMfg', 'T'],
      ['description', 'Fast']]],
    ['packageDto',
     [['PackageDto', ['fetName', 'a'], ['fetValue', 'b']],
      ['PackageDto', ['fetName', 'c'], ['fetValue', 'd']],
      ['PackageDto', ['fetName', 'd'], ['fetValue', 'z']],
      ['PackageDto', ['fetName', 'f'], ['fetValue', 'Sq']],
      ['PackageDto', ['fetName', 'g'], ['fetValue', 'p']]]],
    ['additionalDetailsDto',
     ['AdditionalDetailsDto',
      ['cr', None],
      ['pOptions', None],
      ['inv', None],
      ['pcns', None]]],
    ['partImageDto', None],
    ['riskDto',
     ['RiskDto',
      ['life', 'Low'],
      ['lStage', 'Mature'],
      ['yteol', '10'],
      ['Date', '2023']]],
    ['partOptionsDto',
     [['ReplacementDto',
       ['partNumber', 'BQ2'],
       ['manufacturer', 'T'],
       ['type', 'Reel']]]],
    ['inventoryDto',
     [['InventoryDto',
       ['distributor', 'V'],
       ['quantity', '88'],
       ['buyNowLink', 'https://www...']],
      ['InventoryDto',
       ['distributor', 'R'],
       ['quantity', '7'],
       ['buyNowLink', 'http://www.r.']],
      ['InventoryDto',
       ['distributor', 'RS'],
       ['quantity', '2'],
       ['buyNowLink', 'http://www.rs..']]]]]]]]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top