XML을 생성 할 때 CDATA에서 NewLines를 보존하는 방법은 무엇입니까?

https://stackoverflow.com/questions/1216875

06-07-2019
|

문제

나는 newline 그리고 tab XML 파일로 사용하여 사용합니다

Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));

그러나 내가 이것을 다시 읽을 때

Node vs =  xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();

더 이상 신축성이없는 문자열을 얻습니다.
디스크의 XML을 직접 살펴보면 신약이 보존 된 것 같습니다. 따라서 XML 파일에서 읽을 때 문제가 발생합니다.

Newlines를 어떻게 보존 할 수 있습니까?

감사!

해결책

문서를 구문 분석하고 작성하는 방법을 모르겠지만 여기에 귀하의 강화 된 코드 예제가 있습니다.

// creating the document in-memory                                                        
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

Element element = xmldoc.createElement("TestElement");                                    
xmldoc.appendChild(element);                                                              
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));              

// serializing the xml to a string                                                        
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();             

DOMImplementationLS impl =                                                                
    (DOMImplementationLS)registry.getDOMImplementation("LS");                             

LSSerializer writer = impl.createLSSerializer();                                          
String str = writer.writeToString(xmldoc);                                                

// printing the xml for verification of whitespace in cdata                               
System.out.println("--- XML ---");                                                        
System.out.println(str);                                                                  

// de-serializing the xml from the string                                                 
final Charset charset = Charset.forName("utf-16");                                        
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));       
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);

Node vs =  xmldoc2.getElementsByTagName("TestElement").item(0);                           
final Node child = vs.getFirstChild();                                                    
String x = child.getNodeValue();                                                          

// print the value, yay!                                                                  
System.out.println("--- Node Text ---");                                                  
System.out.println(x);

lsserializer를 사용한 직렬화는 W3C 방법입니다 (여기를 봐). 출력은 라인 분리기와 함께 예상대로 다음과 같습니다.

--- XML --- 
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text --- 
first line
second line

다른 팁

node.getNodeType ()를 사용하여 각 노드의 유형을 확인해야합니다. 유형이 cdata_section_node 인 경우 cdata 가드를 node.getnodevalue에 동의해야합니다.

공백 문자를 보존하기 위해 CDATA를 반드시 사용할 필요는 없습니다. XML 사양 이 문자를 인코딩하는 방법을 지정하십시오.

예를 들어, 새로운 공간이 포함 된 값이있는 요소가있는 경우

  &#xA;

캐리지 리턴 :

 &#xD;

기타 등등

편집 : 무의미한 것들을 모두 자릅니다

내가 시도한 DOM 구현이 어떤 DOM 구현을 사용하고 있는지 궁금합니다. 왜냐하면 내가 시도한 두 JVM의 기본 동작을 반영하지 않기 때문입니다 (XERCES 임시로 배송). 나는 또한 당신의 문서가 어떤 Newline 캐릭터에 가지고 있는지에 관심이 있습니다.

CDATA가 공백을 보존 해야하는지 여부는 확실하지 않습니다. 나는 많은 요인이 관련되어 있다고 생각합니다. DTDS/스키마가 공백 처리 방법에 영향을 미치지 않습니까?

XML : Space = "Preserve"속성을 사용해 볼 수 있습니다.

XML : Space = 'Preserve'는 그렇지 않습니다. 그것은 "모든 공백"노드만을위한 것입니다. 즉, 공백 노드를 원한다면

<this xml:space='preserve'> <has/>
<whitespace/>
</this>

그러나 그 공백 노드는 공백 일 뿐이라는 것을 알 수 있습니다.

나는 Xerces가 CDATA 컨텐츠의 격리를 허용하는 이벤트를 생성하기 위해 고군분투하고 있습니다. 아직 해결책이 없습니다.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow