Yahoo BOSS Python Library, ExpatError

https://stackoverflow.com/questions/1421099

07-07-2019
|

Question

I tried to install the Yahoo BOSS mashup framework, but am having trouble running the examples provided. Examples 1, 2, 5, and 6 work, but 3 & 4 give Expat errors. Here is the output from ex3.py:

gpython examples/ex3.py
    examples/ex3.py:33: Warning: 'as' will become a reserved keyword in Python 2.6
Traceback (most recent call last):
  File "examples/ex3.py", line 27, in <module>
    digg = db.select(name="dg", udf=titlef, url="http://digg.com/rss_search?search=google+android&area=dig&type=both&section=news")
  File "/usr/lib/python2.5/site-packages/yos/yql/db.py", line 214, in select
    tb = create(name, data=data, url=url, keep_standards_prefix=keep_standards_prefix)
  File "/usr/lib/python2.5/site-packages/yos/yql/db.py", line 201, in create
    return WebTable(name, d=rest.load(url), keep_standards_prefix=keep_standards_prefix)
  File "/usr/lib/python2.5/site-packages/yos/crawl/rest.py", line 38, in load
    return xml2dict.fromstring(dl)
  File "/usr/lib/python2.5/site-packages/yos/crawl/xml2dict.py", line 41, in fromstring
    t = ET.fromstring(s)
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 963, in XML
    parser.feed(text)
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 1245, in feed
    self._parser.Parse(data, 0)
    xml.parsers.expat.ExpatError: syntax error: line 1, column 0

It looks like both examples are failing when trying to query Digg.com. Here is the query that is constructed in ex3.py's code:

diggf = lambda r: {"title": r["title"]["value"], "diggs": int(r["diggCount"]["value"])}
digg = db.select(name="dg", udf=diggf, url="http://digg.com/rss_search?search=google+android&area=dig&type=both&section=news")

Solution

The problem is the digg search string. It should be "s=". Not "search="

OTHER TIPS

I believe that must be an error in the example: it's getting a JSON result (indeed if you copy and paste that URL in your browser, you'll download a file names search.json which starts with

{"results":[{"profile_image_url":
"http://a3.twimg.com/profile_images/255524395/KEN_OMALLEY_REVISED_normal.jpg",
"created_at":"Mon, 14 Sep 2009 14:52:07 +0000","from_user":"twilightlords",

i.e. perfectly normal JSON; but then instead of parsing it with modules such as json or simplejson, it tries to parse it as XML -- and obviously this attempt fails.

I believe the fix (which probably needs to be brought to the attention of whoever maintains that code so they can incorporate it) is either to ask for XML instead of JSON output, OR to parse the resulting JSON with appropriate means instead of trying to look at it as XML (not sure how to best implement either change, as I'm not familiar with that code).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow