Not able to parse a string that contains utf8 0xc2 0x85 characters using jdom parser

https://stackoverflow.com/questions/16462071

jdom

19-04-2022
|

문제

I have a utf-8 string that contains 0xc2 0x85 characters. Eclipse treats this as whitespace. Certain application treats this as '...'.

Since, the string is xml, I'm using jdom parser and the jdom parser fails and gives the following exception.

org.jdom.input.JDOMParseException: Error on line 1: Content is not allowed in prolog. at org.jdom.input.SAXBuilder.build(SAXBuilder.java:381) at org.jdom.input.SAXBuilder.build(SAXBuilder.java:764)

Any idea on why the jdom parser doesn't treat this as whitespace? What else can i do to have the parser validate the xml successfully? All the other elements in the xml string seems fine.

해결책

Whitespace has a very specific meaning in XML. Outside the root Element in XML the only characters you are allowed are (#x20 | #x9 | #xD | #xA)+ (space, carriage return, newline, and tab).

The prolog area in the XML is allowed to contain limited structures, and space.

The characters you have shown are not allowed in valid XML outside the root element. Sorry.

다른 팁

JDOM (or the SAX parser, actually) does not have any problem parsing that character. The exception you get is invariably caused by illegal characters before the xml prolog:

<?xml version="1.0" encoding="UTF-8"?>

It might be that you have "invisible" characters before that, but they are there, still.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow