The code below is not correctly transforming the input data to XML. I think so because I don't expect Transformer to generate output with non-valid xml characters in it (I'm talking about the &).
Here is the code:
package com.example.test.formatter;
import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import android.test.AndroidTestCase;
import android.util.Log;
public class XmlTest extends AndroidTestCase {
public void testFormat() {
try {
String arbitraryInput = "Arbitrary input: \uD83D"; // we don't have control over this input
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.newDocument();
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT, "true");
StringWriter stringWriter = new StringWriter();
StreamResult result = new StreamResult(stringWriter);
DOMSource source = new DOMSource(document);
Element root = document.createElement("root");
Element subElement = document.createElement("key");
subElement.setTextContent(arbitraryInput);
root.appendChild(subElement);
document.appendChild(root);
stringWriter.getBuffer().setLength(0);
transformer.transform(source, result);
String parsed = stringWriter.toString(); // <root><key>Arbitrary input: �</key></root>
Log.e("parsed", parsed);
}
catch(Throwable ex) {
ex.printStackTrace();
}
}
}
I was expecting to get something like
<root><key>Arbitrary input: & #55357;</key></root>
But instead I get:
<root><key>Arbitrary input: �</key></root>
So, what should I do if I want to get valid XML output of Transformer?
Thanks!
EDIT:
I think that the output is invalid because when I'm trying to process the produced XML output with PHP like this:
<?php
$data = "<root><key>Arbitrary input: �</key></root>";
$xmlDocument = new \DOMDocument();
$xmlDocument->loadXML($data);
I get a warning (or exception if the environment was configured to throw exceptions on warnings):
PHP Warning: DOMDocument::loadXML(): xmlParseCharRef: invalid xmlChar value 55357 in Entity, line: 1 in /tmp/test.php on line 6
PHP Stack trace:
PHP 1. {main}() /tmp/test.php:0
PHP 2. DOMDocument->loadXML() /tmp/test.php:6
Please note that if the I was trying to process with DOMDocument (PHP) the following code everything would be just fine:
$data = " <root><key>Arbitrary input: & #55357;</key></root>";
Either the Java Transformer or the DOMDocument (PHP) is doing something wrong. Can you point me out?
Thanks!