I am trying to parse an xml with few attrib values are integers. In such cases, python ElementTree class raise a ParseError

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 120, col umn 32

And here is my xml

<Rectangle
  leftTopX = 0
  leftTopY = 0
  rightBottomX = 20
  rightBottomY = 40
/>

Any suggestions on how to avoid this ParseError?

Modifying attrib values in below fashion would serve as a solution for my problem. But I have multiple xml files to parse. Changing attrib values would take more time.

<Rectangle
  leftTopX = "0"
  leftTopY = "0"
  rightBottomX = "20"
  rightBottomY = "40"
/>
有帮助吗?

解决方案

You can switch to BeautifulSoup parser - it's more forgiving in terms of well-formness. Example:

from bs4 import BeautifulSoup


data = """
<Rectangle
  leftTopX = 0
  leftTopY = 0
  rightBottomX = 20
  rightBottomY = 40 />
"""

soup = BeautifulSoup(data)
print soup.rectangle

prints:

<rectangle lefttopx="0" lefttopy="0" rightbottomx="20" rightbottomy="40"></rectangle>

You may also use it with lxml parser (you'll need lxml installed):

soup = BeautifulSoup(data, "lxml")

Hope that helps.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top