Parsing Gracenote's API XML return

https://stackoverflow.com/questions/22545051

18-06-2023
|

Question

I'm fairly new to python and having problems with my code. I am trying to parse and XML file returned from gracenote but I keep having problems. This is my code of trying to take out just the artists name.

import urllib.request

from lxml import etree

queryXML=b'QUERIES><LANG>eng</LANG><AUTH>/
    +<CLIENT>a_client_id</CLIENT>/
    +<USER>a_user_id</USER>/
    +</AUTH><QUERY CMD="ALBUM_SEARCH"><TEXT TYPE="ARTIST">oasis</TEXT>/
    +<TEXT TYPE="ALBUM_TITLE"></TEXT>/
    +<TEXT TYPE="TRACK_TITLE">wonderwall</TEXT></QUERY></QUERIES>'

response = urllib.request.urlopen("https://c3172608.web.cddbp.net/webapi/xml/1.0/", queryXML)

root = etree.parse(response).getroot()


artist = item.find('ARTIST').text

print(artist)

The error I receive is

    Traceback (most recent call last):

  File "C:\Users\Aidan Howie\Documents\University\First Year\EE106 Group  
   Project\frankocean.py", line 8, in <module>
    root = etree.parse(response).getroot()
  File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src\lxml\lxml.etree.c:69955)
  File "parser.pxi", line 1769, in lxml.etree._parseDocument 
(src\lxml\lxml.etree.c:102257)
  File "parser.pxi", line 1789, in lxml.etree._parseFilelikeDocument 
(src\lxml\lxml.etree.c:102516)
  File "parser.pxi", line 1684, in lxml.etree._parseDocFromFilelike
 (src\lxml\lxml.etree.c:101442)
  File "parser.pxi", line 1134, in lxml.etree._BaseParser._parseDocFromFilelike 
(src\lxml\lxml.etree.c:97069)
  File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc
 (src\lxml\lxml.etree.c:91275)
  File "parser.pxi", line 683, in lxml.etree._handleParseResult 
(src\lxml\lxml.etree.c:92461)
  File "parser.pxi", line 622, in lxml.etree._raiseParseError 
(src\lxml\lxml.etree.c:91757)
  File "<string>", line None
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

Can anyone please help as I've been struggling with this for a while now.

Solution

Below is a quick and dirty fix based on your code. It runs on Python 2.7. I hope it helps.

import urllib2
import StringIO
from lxml import etree

queryXML='''
<QUERIES>
    <LANG>eng</LANG>
    <AUTH>
        <CLIENT>a_client_id</CLIENT>
        <USER>a_user_id</USER>
    </AUTH>
    <QUERY CMD="ALBUM_SEARCH">
        <TEXT TYPE="ARTIST">oasis</TEXT>
        <TEXT TYPE="ALBUM_TITLE"></TEXT>
        <TEXT TYPE="TRACK_TITLE">wonderwall</TEXT>
    </QUERY>
</QUERIES>
'''.strip()

request = urllib2.Request("https://cxxxxxxx.web.cddbp.net/webapi/xml/1.0/", queryXML)
response = urllib2.urlopen(request)

response_page = response.read()
tree = etree.parse(StringIO.StringIO(response_page))
root = tree.getroot()

artist = root.find('.//ARTIST').text

print artist

Note that the response returns multiple results and this code only prints out the first one. Also error handling is missing here so please just use it just as a reference to get you started.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow