This appears to be a bug in libXML, not in LXML itself persay, and has been there since 2006. It appears that it can't easily be fixed due to backwards compatibility issues.
Is it possible to show line numbers more than 65535 when you validate an xml file using lxml?
-
05-07-2022 - |
Question
I try to validate an xml file with around 1 million lines using lxml. This is my code.
import codecs
import lxml.etree as ET
xsd_file = codecs.open(r'test.xsd', 'rb', 'utf-8')
xml_file = codecs.open(r'test.xml', 'rb', 'utf-8')
xmlschema_doc = ET.parse(xsd_file)
xmlschema = ET.XMLSchema(xmlschema_doc)
doc = ET.parse(xml_file)
print (xmlschema.error_log.filter_from_errors())
But I found the output always says "65535:0" if the line in error exceeds 65535. That is,
file:///C:test.xml:65535:0:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT: Element 'word': Missing child element(s).
file:///C:test.xml:65535:0:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT: Element 'word': Missing child element(s).
file:///C:test.xml:65535:0:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT: Element 'word': Missing child element(s).
So I can't identify where the lines above are. Is there any solutions?
Solution
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow