Question

I have following code in a xml-file and I want to extract the table of the xml file via python:

"....

<Test_input_data>
    <web_pins type="list" orientation="column">
        <web_pin_id type="column_definition" data_type="int" index="1"/>
        <pin_center_node type="column_definition" data_type="int" index="2"/>
        <journal_center_node type="column_definition" data_type="int" index="3"/>
         1  1012  5011
         2  2012  5012
         3  3012  5013
    </web_pins>
</Test_input_data>

...."

That means I want to extract separately these lines and each value:

  1  1012  5011
  2  2012  5012
  3  3012  5013

But I don't know how to access these values, maybe YOU may help me?

Was it helpful?

Solution

You can use BeautifulSoup, here is an example:

from bs4 import BeautifulSoup as bs

xml = """<Test_input_data>
    <web_pins type="list" orientation="column">
        <web_pin_id type="column_definition" data_type="int" index="1"/>
        <pin_center_node type="column_definition" data_type="int" index="2"/>
        <journal_center_node type="column_definition" data_type="int" index="3"/>
         1  1012  5011
         2  2012  5012
         3  3012  5013
    </web_pins>
</Test_input_data>"""
soup = bs(xml)
tag = soup.find("web_pins")
text = tag.text #Here you get your text!
##P.S. you can also use:
soup.web_pins.text


Also, you may want to look at standart's library HTMLParser. Again, here is an example:

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

    data = ""
    search_tag = None
    grab_data = False

    def feed(self, data, tag_to_search_for):

        self.search_tag = tag_to_search_for
        HTMLParser.feed(self, data)

    def handle_starttag(self, tag, attrs):

        if tag == self.search_tag:
            self.grab_data = 1

    def handle_data(self, data):

        if self.grab_data:
            self.data = data

    def handle_endtag(self, tag):

        if tag == self.search_tag:
            self.grab_data = 0

xml = """<Test_input_data>
    <web_pins type="list" orientation="column">
        <web_pin_id type="column_definition" data_type="int" index="1"/>
        <pin_center_node type="column_definition" data_type="int" index="2"/>
        <journal_center_node type="column_definition" data_type="int" index="3"/>
         1  1012  5011
         2  2012  5012
         3  3012  5013
    </web_pins>
</Test_input_data>"""

parser = MyHTMLParser()
parser.feed(xml, "web_pins")
print parser.data #Ta-daa!

OTHER TIPS

I suggest you to use ElementTree. (of course, other parsers exist, such as lxml)

Using it is very simple, create an Element instance from the XML by using XML function and you are ready to go.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top