org.jdom.xpath.Xpath is not returning UTF-8

https://stackoverflow.com/questions/18206974

24-06-2022
|

문제

I have this org.jdom.Document and I get an element out of it and try to output a value with xpath. The problem is that the Norwegain letters come out as ? instead of æ ø å.

Element nameNode = (Element) XPath.selectSingleNode( element, "contentdata/name" );
System.out.print(nameNode.getText());
// Produces "S?rbyen"

When I use

XMLOutputter outputter = new XMLOutputter( Format.getPrettyFormat());
outputter.output( nameNode, System.out );
// Produces "<name>Sørbyen</name>"

So how can I use XPath.selectSingleNode() or nameNode.getText() and return the proper UTF-8?

UPDATE: Turns out that the string is only altered in the console output and comparing nameNode.getText().equalsIgnoreCase("Sørbyen") returns true.

해결책

The problem is not with the xpath but it is with the way you are trying to verify the value. Console is not by default UTF-8 enabled. So when you use

System.out.print(nameNode.getText());

it will display some strange characters.

If you are using eclipse then you can configure your console encoding settings by going to Run Configuration > Common -> Encoding -> select UTF-8 from the drop down.

enter image description here

다른 팁

Well problem is not in fetching the character its in the display. use following as command line args for starting application and it should work

-Dfile.encoding=UTF-8-Dfile.encoding=UTF-8

Hope it helps

What is the parent of XmlOutputter? If it is OutputStreamWriter then set encoding to "UTF-8" see http://docs.oracle.com/javase/7/docs/api/java/io/OutputStreamWriter.html#OutputStreamWriter(java.io.OutputStream.

Like this:

OutputStreamWriter sw = new OutputStreamWriter(System.out, "UTF-8");

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow