Question

I was trying out the bit.ly api for shorterning and got it to work. It returns to my script an xml document. I wanted to extract out the tag but cant seem to parse it properly.

askfor = urllib2.Request(full_url)
response = urllib2.urlopen(askfor)
the_page = response.read()

So the_page contains the xml document. I tried:

from xml.dom.minidom import parse
doc = parse(the_page)

this causes an error. what am I doing wrong?

Was it helpful?

Solution

You don't provide an error message so I can't be sure this is the only error. But, xml.minidom.parse does not take a string. From the docstring for parse:

Parse a file into a DOM by filename or file object.

You should try:

response = urllib2.urlopen(askfor)
doc = parse(response)

since response will behave like a file object. Or you could use the parseString method in minidom instead (and then pass the_page as the argument).

EDIT: to extract the URL, you'll need to do:

url_nodes = doc.getElementsByTagName('url')
url = url_nodes[0]
print url.childNodes[0].data

The result of getElementsByTagName is a list of all nodes matching (just one in this case). url is an Element as you noticed, which contains a child Text node, which contains the data you need.

OTHER TIPS

from xml.dom.minidom import parseString
doc = parseString(the_page)

See the documentation for xml.dom.minidom.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top