You can use JSoup library to parse and fix your HTML and then use XSLT.
How to support not well formed XHTML for XSLT
题
I've got an arbitrary XHTML document which are usually not well formed, since websites can be made like that and browser will show it. How can I support XSLT translation for not well formed XHTML code? Is there a way that it can avoid those parts which are not well formed?
I have this code in Java, but as I've said it's not supporting not well formed XHTML:
try {
TransformerFactory tFactory=TransformerFactory.newInstance();
Source xslDoc=new StreamSource("path1");
Source xmlDoc=new StreamSource("path2");
String outputFileName="path3";
OutputStream htmlFile=new FileOutputStream(outputFileName);
Transformer trasform=tFactory.newTransformer(xslDoc);
trasform.transform(xmlDoc, new StreamResult(htmlFile));
}
catch (Exception e) {...}
解决方案
其他提示
You can try to use an HTML parser like http://about.validator.nu/htmlparser/ or like TagSoup.
不隶属于 StackOverflow