
I have an xml feed, say:

I want to get the list of hrefs for the videos:

 ['', 'ht....', ... ]
Was it helpful?


from xml.etree import cElementTree as ET
import urllib

def get_bass_fishing_URLs():
  results = []
  data = urllib.urlopen(
  tree = ET.parse(data)
  ns = '{}'
  for entry in tree.findall(ns + 'entry'):
    for link in entry.findall(ns + 'link'):
      if link.get('rel') == 'alternate':

as it appears that what you get are the so-called "alternate" links. The many small, possible variations if you want something slightly different, I hope, should be clear from the above code (plus the standard Python library docs for ElementTree).


Have a look at Universal Feed Parser, which is an open source RSS and Atom feed parser for Python.

In such a simple case, this should be enough:

import re, urllib2
request = urllib2.urlopen("")
text =
videos = re.findall("http:\/\/www\.youtube\.com\/watch\?v=[\w-]+", text)

If you want to do more complicated stuff, parsing the XML will be better suited than regular expressions

import urllib
from xml.dom import minidom
xmldoc = minidom.parse(urllib.urlopen(''))

links = xmldoc.getElementsByTagName('link')
hrefs = []
for links in link:
    if link.getAttribute('rel') == 'alternate':
        hrefs.append( link.getAttribute('href') )

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top