Question

I have a .gpx file which is cut off int the middle of the file. When I try to parse it using the gpxpy library I run into the following error.

Parsing points in track.gpx
ERROR:root:expected '>', line 3125, column 29
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gpxpy-0.8.7-py2.7.egg/gpxpy/parser.py", line 209, in parse
    self.xml_parser = LXMLParser(self.xml)
  File "/usr/local/lib/python2.7/dist-packages/gpxpy-0.8.7-py2.7.egg/gpxpy/parser.py", line 107, in __init__
    self.dom = mod_etree.XML(self.xml)
  File "lxml.etree.pyx", line 2734, in lxml.etree.XML (src/lxml/lxml.etree.c:54411)
  File "parser.pxi", line 1578, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82748)
  File "parser.pxi", line 1457, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:81546)
  File "parser.pxi", line 965, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:78216)
  File "parser.pxi", line 569, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:74472)
  File "parser.pxi", line 650, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:75363)
  File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74696)
XMLSyntaxError: expected '>', line 3125, column 29

File "gpxscript.py", line 370, in extractpoints gpx = gpxpy.parse(file)
File "/usr/local/lib/python2.7/dist-packages/gpxpy-0.8.7-py2.7.egg/gpxpy/__init__.py",
     line 28, in parse raise mod_gpx.GPXException('Error parsing {0}: {1}'
                       .format(xml_or_file[0 : 100], parser.get_error()))
TypeError: 'file' object has no attribute '__getitem__'

These are the relevant commands of the script which produces the error.

368  file = open(filepath)
369  try:
370      gpx = gpxpy.parse(file)
371  except gpxpy.gpx.GPXException:
372      print "GPXException for %s." % filepath
373      return 1

I filed a bug for the library as suggested. I added a sample file to the bug report which produces the syntax error.

Was it helpful?

Solution

This appears to be a bug in gpxpy's error handling.

Looking at the source to parse, when the parser fails without raising an exception, it tries to raise an exception with this:

raise mod_gpx.GPXException('Error parsing {0}: {1}'.format(xml_or_file[0 : 100], parser.get_error()))

This assumes that xml_or_file is an XML string—but, as the name implies, it's allowed to be either a string or a file object. So, what you're doing (giving it a file object) is perfectly legal and should work, and it doesn't, and therefore it's a bug.

So, you should file an issue. The correct patch should be something like:

if not parser.is_valid():
    try:
        fragment = xml_or_file[0 : 100]
    except TypeError:
        xml_or_file.seek(0)
        fragment = xml_or_file.read(100)
    raise mod_gpx.GPXException('Error parsing {0}: {1}'.format(fragment, parser.get_error()))

So, how do you work around this? A few options:

  1. Since it only happens with invalid files anyway, you can just use except Exception or except (gpxpy.gpx.GPXException, TypeError).

  2. Since it only happens when you give it a the file object, give it a string instead: gpx = gpx.parse(file.read()). This is a bad idea if the file is very large, of course.

  3. Since the buggy function is only 12 lines of trivial code wrapping the real function, just use the real function directly. Or, if you like the wrapper, copy it, fix it, and use your own copy instead.


Meanwhile, given that the very first bit of code I looked at in this library has some obvious red flags (Why xml_or_file[0 : 100] instead of just xml_or_file[:100]? Why catch exceptions, throw them away and just set a flag, and then use that flag to raise a new exception with all the information missing?), if you're not able to debug libraries on your own, I don't think this one is ready for you to use.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top