XML namespace parse search w/Element Tree and Python

https://stackoverflow.com/questions/19911085

30-07-2022
|

Question

I've searched all over SO (including here) and elsewhere but am still stuck on trying to pull specific information from XML when there are namespace prefixes. I'm trying to pull the URL from "Instance Document" of the below using ElementTree. Here is the line containing the URL:

<edgar:xbrlFile edgar:sequence="2" edgar:file="qcom-20090927.xml" edgar:type="EX-101.INS" edgar:size="1479637" edgar:description="EX-101 INSTANCE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927.xml" />

I've tried many different methods, but I keep getting an empty list when .findall. I've tried moving down the tree before searching, etc. Can someone help me grab this information into a variable? Thanks so much for any assistance. Ethan

<?xml version="1.0" encoding="windows-1252"?>
<?xml-stylesheet type="text/xsl" href="/rss/styles/shared_xsl_stylesheet_v2.xml"?>
<rss version="2.0">
  <channel>
    <title>All XBRL Data Submitted to the SEC for 2009-12</title>
    <link>http://www.sec.gov/spotlight/xbrl/filings-and-feeds.shtml</link>
    <atom:link href="http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2009-12.xml" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/>
    <description>This is a list all of the filings containing XBRL for 2009-12</description>
    <language>en-us</language>
    <pubDate>Tue, 25 Jun 2013 00:00:00 EDT</pubDate>
    <lastBuildDate>Tue, 25 Jun 2013 00:00:00 EDT</lastBuildDate>
    <item>
      <title>QUALCOMM INC/DE (0000804328) (Filer)</title>
      <link>http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/0000950123-09-072780-index.htm</link>
      <guid>http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/0000950123-09-072780-xbrl.zip</guid>
      <enclosure url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/0000950123-09-072780-xbrl.zip" length="126771" type="application/zip" />
      <description>10-K/A</description>
      <pubDate>Tue, 22 Dec 2009 17:23:59 EST</pubDate>
      <edgar:xbrlFiling xmlns:edgar="http://www.sec.gov/Archives/edgar">
        <edgar:companyName>QUALCOMM INC/DE</edgar:companyName>
        <edgar:formType>10-K/A</edgar:formType>
        <edgar:filingDate>12/22/2009</edgar:filingDate>
        <edgar:cikNumber>0000804328</edgar:cikNumber>
        <edgar:accessionNumber>0000950123-09-072780</edgar:accessionNumber>
        <edgar:fileNumber>000-19528</edgar:fileNumber>
        <edgar:acceptanceDatetime>20091222172359</edgar:acceptanceDatetime>
        <edgar:period>20090927</edgar:period>
        <edgar:assistantDirector>11</edgar:assistantDirector>
        <edgar:assignedSic>3663</edgar:assignedSic>
        <edgar:fiscalYearEnd>0930</edgar:fiscalYearEnd>
        <edgar:xbrlFiles>
          <edgar:xbrlFile edgar:sequence="1" edgar:file="a54714e10vkza.htm" edgar:type="10-K/A" edgar:size="19974" edgar:description="10-K/A" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/a54714e10vkza.htm" />
          **<edgar:xbrlFile edgar:sequence="2" edgar:file="qcom-20090927.xml" edgar:type="EX-101.INS" edgar:size="1479637" edgar:description="EX-101 INSTANCE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927.xml" />**
          <edgar:xbrlFile edgar:sequence="3" edgar:file="qcom-20090927.xsd" edgar:type="EX-101.SCH" edgar:size="18628" edgar:description="EX-101 SCHEMA DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927.xsd" />
          <edgar:xbrlFile edgar:sequence="4" edgar:file="qcom-20090927_cal.xml" edgar:type="EX-101.CAL" edgar:size="50670" edgar:description="EX-101 CALCULATION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927_cal.xml" />
          <edgar:xbrlFile edgar:sequence="5" edgar:file="qcom-20090927_lab.xml" edgar:type="EX-101.LAB" edgar:size="258068" edgar:description="EX-101 LABELS LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927_lab.xml" />
          <edgar:xbrlFile edgar:sequence="6" edgar:file="qcom-20090927_pre.xml" edgar:type="EX-101.PRE" edgar:size="133865" edgar:description="EX-101 PRESENTATION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927_pre.xml" />
          <edgar:xbrlFile edgar:sequence="7" edgar:file="qcom-20090927_def.xml" edgar:type="EX-101.DEF" edgar:size="21223" edgar:description="EX-101 DEFINITION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927_def.xml" />
        </edgar:xbrlFiles>
      </edgar:xbrlFiling>
    </item>
    <item>

Solution

Suppose root is your root node from the ElemenTree.

The namespace is read from the 'edgar:xbrlFiling' node's attribute 'xmlns:edgar':

xmlns:edgar="http://www.sec.gov/Archives/edgar"

ElemTree encodes edgar:any_tag as the python string:

ns + 'any_tag'

Where ns is the python string below:

ns = '{http://www.sec.gov/Archives/edgar}'

So to find all the xbrlFile nodes you can use the following XPath expression:

xbrlFiles = root.findall('.//'+ns+'xbrlFile')

To get the URL attribute you need to extract the ns+'url' attribute (in this case for the second file):

myurl = xbrlFiles[1].attrib[ns + 'url']

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow