How to associate values of tags with label of the tag the using ElementTree in a Pythonic way

StackOverflow https://stackoverflow.com/questions/22885068

  •  28-06-2023
  •  | 
  •  

Domanda

I have some xml files I am trying to process.

Here is a derived sample from one of the files

fileAsString = """
<?xml version="1.0" encoding="utf-8"?>
<eventDocument>
<schemaVersion>X2</schemaVersion>

  <eventTable>
       <eventTransaction>
           <eventTitle>
               <value>Some Event</value>
           </eventTitle>
           <eventDate>
               <value>2003-12-31</value>
           </eventDate>
           <eventCoding>
               <eventType>47</eventType>
               <eventCode>A</eventCode>
               <footnoteId id="F1"/>
               <footnoteId id="F2"/>
           </eventCoding>
           <eventCycled>
               <value></value>
           </eventCycled>
           <eventAmounts>
                <eventVoltage>
                <value>40000</value>
                </eventVoltage>
           </eventAmounts>
      </eventTransaction>
   </eventTable>
</eventDocument>"""

Note, there can be many eventTables in each document and events can have more details then just the ones I have isolated.

My goal is to create a dictionary in the following form

{'eventTitle':'Some Event, 'eventDate':'2003-12-31','eventType':'47',\
 'eventCode':'A', 'eventCoding_FTNT_1':'F1','eventCoding_FTNT_2':'F2',\
  'eventCycled': , 'eventVoltage':'40000'}

I am actually reading these in from files but assuming I have a string my code to get the text for the elements right below the eventTransaction element where the text is inside a value tag is as follows

import xml.etree.cElementTree as ET
myXML = ET.fromstring(fileAsString)
eventTransactions = [ e for e in myXML.iter() if e.tag == 'eventTransaction']
testTransaction = eventTransactions[0]
my_dict = {}
for child_of in testTransaction:
    grand_children_tags = [e.tag for e in child_of]
    if grand_children_tags == ['value']:
        my_dict[child_of.tag] = [e.text for e in child_of][0]

>>> my_dict
{'eventTitle': 'Some Event', 'eventCycled': None, 'eventDate': '2003-12-31'}

This seems wrong because I am not really taking advantage of xml instead I am using brute force but I have not seemed to find an example.

Is there a clearer and more pythonic way to create the output I am looking for?

È stato utile?

Soluzione

Use XPath to pull out the elements you're interested in.

The following code creates a list of lists of dicts (i.e. tables/transactions/info):

tables = []
myXML = ET.fromstring(fileAsString)
for table in myXML.findall('./eventTable'):
    transactions = []
    tables.append(transactions)
    for transaction in table.findall('./eventTransaction'):
        info = {}
        for element in table.findall('.//*[value]'):
            info[element.tag] = element.find('./value').text or ''
        coding = transaction.find('./eventCoding')
        if coding is not None:
            for tag in 'eventType', 'eventCode':
                element = coding.find('./%s' % tag)
                if element is not None:
                    info[tag] = element.text or ''
            for index, element in enumerate(coding.findall('./footnoteId')):
                info['eventCoding_FTNT_%d' % index] = element.get('id', '')
        if info:
            transactions.append(info)

Output:

[[{'eventCode': 'A',
   'eventCoding_FTNT_0': 'F1',
   'eventCoding_FTNT_1': 'F2',
   'eventCycled': '',
   'eventDate': '2003-12-31',
   'eventTitle': 'Some Event',
   'eventType': '47',
   'eventVoltage': '40000'}]]
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top