Parsing Node Value of XML in Python with ElementTree

https://stackoverflow.com/questions/22378876

14-06-2023
|

题

I have the following XML which I have parsed from a webpage:

<!--
Parts from the iGEM Registry of Standard Biological Parts
-->
<rsbpml>
 <part_list>
  <part>
   <part_id>151</part_id>
   <part_name>BBa_B0034</part_name>
   <part_short_name>B0034</part_short_name>
   <part_short_desc>RBS (Elowitz 1999) -- defines RBS efficiency</part_short_desc>
   <part_type>RBS</part_type>
   <release_status>Released HQ 2013</release_status>
   <sample_status>In stock</sample_status>

And I want to extract some of the values.

For example I want to ouput the value RBS from <part_type>.

I've tried the following:

bb_xml_raw = urllib2.urlopen("http://parts.igem.org/cgi/xml/part.cgi?part=BBa_B0034")
self.parse = ET.parse(bb_xml_raw)
self.root = self.parse.getroot()

for part in self.root.findall('part_list'):
   print part.find('part_type').text

But it doesn't work, I get: AttributeError: 'NoneType' object has no attribute 'text'

What am I doing wrong?

解决方案

Try changing

for part in self.root.findall('part_list'):

for part in self.root.find('part_list'):

findall returns a list of all the nodes that match. So, the first line returns a list of all the part_list nodes. Your <part_list> node doesn't have any children with the tag part_type, so it returns None, and you get your error.

If you have a single node part_list, then find will return the actual node, and you can use the normal for part in syntax to walk over all of its subnodes instead.

If you have multiple part_list tags, then you just need a nested for loop:

for part_list in self.root.findall('part_list'):
    for part in part_list: 
         etc.

Edit: Given that this was sort of an XY problem - if what you are looking for is really a particular subpath, you can do that all at once, like this:

all_parts = self.root.findall('part_list/part')
print all_parts[0].find('part_type').tag

etc.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow