la creazione di albero XML da un file di testo con Python

https://stackoverflow.com/questions/3759200

04-10-2019
|

Domanda

ho bisogno di evitare di creare doppi rami in un albero XML quando analizza un file di testo. Diciamo che il file di testo è il seguente (l'ordine delle linee è casuale):

Branch1: branch11: message11
Branch1: branch12: message12
Branch2: branch21: message21
Branch2: branch22: message22

Così l'albero XML risultante dovrebbe avere una radice con due rami. Entrambi i rami hanno due sotto-rami. Il codice Python che uso per analizzare questo file di testo è il seguente:

import string
fh = open ('xmlbasic.txt', 'r')
allLines = fh.readlines()
fh.close()
import xml.etree.ElementTree as ET
root = ET.Element('root')

for line in allLines:
   tempv = line.split(':')
   branch1 = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

tree = ET.ElementTree(root)
tree.write('xmlbasictree.xml')

Il problema con questo codice è, che un ramo di albero XML viene creato con ogni riga del file di testo.

Qualche suggerimento su come evitare la creazione di un altro ramo di albero XML se un ramo con questo nome esiste già?

Soluzione

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

import xml.etree.ElementTree as ET

root = ET.Element('root')

for line in lines:
    head, subhead, tail = line.split(":")

    head_branch = root.find(head)
    if not head_branch:
        head_branch = ET.SubElement(root, head)

    subhead_branch = head_branch.find(subhead)
    if not subhead_branch:
        subhead_branch = ET.SubElement(branch1, subhead)

    subhead_branch.text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

La logica è semplice - si è già detto nella vostra domanda! È solo necessario verificare se un ramo già esistente nella struttura prima di crearla.

Si noti che questo è probabile inefficiente, dal momento che si sta cercando fino a l'intero albero per ogni linea. Questo perché ElementTree non è progettato per l'unicità.

Se avete bisogno di velocità (che non può, in particolare per gli alberi un po 'piccole!), Un modo più efficiente potrebbe essere quella di utilizzare un defaultdict per memorizzare la struttura ad albero prima di convertirlo a un ElementTree.

import collections
import xml.etree.ElementTree as ET

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

root_dict = collections.defaultdict( dict )
for line in lines:
    head, subhead, tail = line.split(":")
    root_dict[head][subhead] = tail

root = ET.Element('root')
for head, branch in root_dict.items():
    head_element = ET.SubElement(root, head)
    for subhead, tail in branch.items():
        ET.SubElement(head_element,subhead).text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

Altri suggerimenti

qualcosa in questo senso? Si mantiene il livello dei rami per essere riutilizzato in un dict.

b1map = {}

for line in allLines:
   tempv = line.split(':')
   branch1 = b1map.get(tempv[0])
   if branch1 is None:
       branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow