ElementTree, Python - find element with sub-element containing certain text and add another sub-element to a list

https://stackoverflow.com/questions/22729802

23-06-2023
|

Question

this code is meant to find the first element in a list that has a sub-element containing one of three strings: 'fanart','graphical' or 'poster'. If I find one of these strings, I add the elements tag containing a URL's text to a list, and movie on to the next string. The end result is that I get 3 strings containing 3 URLs to 3 images, which I will then download. However, this function returns a list with 1 item rather than 3, and I have no idea why. Could someone point out what I am doing wrong? For clarity's sake, I also included the XML file structure.

def get_banner(target):
    #Finds urls of show images
    urls = []
    types = ['fanart', 'graphical', 'poster']
    tree = et.parse(target)
    root = tree.getroot()
    for banner in root.findall('Banner'):
        url = 'http://thetvdb.com/banners/' + banner.find('BannerPath').text
        type_ = banner.find('BannerType').text
        print url
        if type_ == types[0]:
            urls.append(url)
            break
    for banner in root.findall('banner'):
        url = 'http://thetvdb.com/banners/' + banner.find('BannerPath').text
        type_ = banner.find('BannerType').text
        print url
        if type_ == types[1]:
            urls.append(url)
            break
    for banner in root.findall('banner'):
        url = 'http://thetvdb.com/banners/' + banner.find('BannerPath').text
        type_ = banner.find('BannerType').text
        if type_ == types[2]:
            urls.append(url)
            break
    return urls

<Banners>
    <Banner>
      <id>406321</id>
      <BannerPath>fanart/original/81189-21.jpg</BannerPath>
      <BannerType>fanart</BannerType>
      <BannerType2>1920x1080</BannerType2>
      <Colors>|234,222,110|0,0,0|103,103,103|</Colors>
      <Language>en</Language>
      <Rating>7.0930</Rating>
      <RatingCount>43</RatingCount>
      <SeriesName>false</SeriesName>
      <ThumbnailPath>_cache/fanart/original/81189-21.jpg</ThumbnailPath>
      <VignettePath>fanart/vignette/81189-21.jpg</VignettePath>
    </Banner>
</Banners>

Solution

Simple answer:

root.findall('banner') != root.findall('Banner')

The XML tags are case sensitive.

I think you can also compress your code somewhat, by putting everything in a single loop:

def get_banner(target):
    #Finds urls of show images                                                                  
    urls = []
    types = ['fanart', 'graphical', 'poster']
    tree = et.parse(target)
    root = tree.getroot()
    for banner in root.findall('Banner'):
        url = 'http://thetvdb.com/banners/' + banner.find('BannerPath').text
        type_ = banner.find('BannerType').text
        if not types:
            break
        elif type_ in types:
            urls.append(url)
            types.remove(type_)
    return urls

The types.remove(type_) statement should ensure that you only return the first match to each of your three types.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow