How can I transform input stream when I'm using stax2 XMLInputFactory2?

https://stackoverflow.com/questions/16515438

21-04-2022
|

Question

Parsing big xml files (500 - 800Mb) using stax2 like that:

XMLStreamReader2 reader = (XMLStreamReader2) xmlif2.createXMLStreamReader(fileName, new FileInputStream(fileName));

to convert it into specific csv and have a next problem. Some text nodes contains "" sequence. In output file it have to be replaced with cyrillic letter "Ё". But when parser found that sequence "", it's throw exception:

[com.ctc.wstx.exc.WstxLazyException] com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x1 at [row,col,system-id]

In stax I have same exception.

Can I set some transforation for xml stream reader and replace  to Ё automative while parsing??? I can create intermediate file, where all be replaced, and than parse it, but it's not good idea

Solution

Error message indicates that your XML is not well-formed: either its encoding is broken. It sounds like it contains an entity reference for Unicode character with value 0x1. This is not allowed for XML 1.0; although it would be legal for XML 1.1. But perhaps XML document does not declare "version='1.0'" in its xml declaration?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow