如何在生成XML时保留CDATA中的换行符？

https://stackoverflow.com/questions/1216875

06-07-2019
|

题

我想在xml文件中写一些包含空格字符的文本，例如 newline 和 tab ，所以我使用

Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));

但是当我使用

读回来时

Node vs =  xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();

我得到一个没有换行符的字符串。
当我直接查看磁盘上的xml时，新行似乎得以保留。所以在读取xml文件时会出现问题。

如何保留换行符？

谢谢！

解决方案

我不知道你是如何解析和编写你的文档的，但这是一个基于你的增强代码示例：

// creating the document in-memory                                                        
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

Element element = xmldoc.createElement("TestElement");                                    
xmldoc.appendChild(element);                                                              
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));              

// serializing the xml to a string                                                        
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();             

DOMImplementationLS impl =                                                                
    (DOMImplementationLS)registry.getDOMImplementation("LS");                             

LSSerializer writer = impl.createLSSerializer();                                          
String str = writer.writeToString(xmldoc);                                                

// printing the xml for verification of whitespace in cdata                               
System.out.println("--- XML ---");                                                        
System.out.println(str);                                                                  

// de-serializing the xml from the string                                                 
final Charset charset = Charset.forName("utf-16");                                        
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));       
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);

Node vs =  xmldoc2.getElementsByTagName("TestElement").item(0);                           
final Node child = vs.getFirstChild();                                                    
String x = child.getNodeValue();                                                          

// print the value, yay!                                                                  
System.out.println("--- Node Text ---");                                                  
System.out.println(x);

使用LSSerializer进行序列化是W3C的方法（看到这里）。输出是预期的，带有行分隔符：

--- XML --- 
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text --- 
first line
second line

其他提示

您需要使用node.getNodeType（）检查每个节点的类型。如果类型是CDATA_SECTION_NODE，则需要将CDATA保护连接到node.getNodeValue。

您不一定要使用CDATA来保留空白字符。 XML 规范指定了如何对这些字符进行编码。

例如，如果您的元素值包含新空格，则应使用

对其进行编码

  &#xA;

回车：

 &#xD;

等等

编辑：削减所有无关的东西

我很想知道你正在使用什么DOM实现，因为它没有镜像我尝试过的几个JVM中的默认行为（它们带有Xerces impl）。我也对你的文档有哪些换行符感兴趣。

我不确定CDATA是否应该保留空格是给定的。我怀疑涉及很多因素。 DTD /模式不会影响空格的处理方式吗？

您可以尝试使用xml：space =＆quot; preserve＆quot;属性。

xml：space ='preserve'不是吗。这仅适用于“所有空白”。节点。也就是说，如果你想要

中的空白节点

<this xml:space='preserve'> <has/>
<whitespace/>
</this>

但是看到那些空白节点只有空格。

我一直在努力让Xerces生成允许隔离CDATA内容的事件。我还没有解决方案。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow