I suggest that you start using dedicated html parser for such opertations. I personally use Jsoup You can build own HTML structure using it as well.
XML/HTML builder class using SAX and W3C Java libraries can't handle HTML input
質問
All,
I am developing a Java Web service that fetches archived data from an external DB.
The requirement is that as well as returning the results as an XML message the client can also request to have the results presented on a HTML page.
This diagram describes the high level design:
I have realised though while doing my implementation that XML and HTML are not exactly the same e.g.:
- HTML parsing can tolerate specific elements not being closed.
This causes my current class implementation to throw up these errors (caused by the HTML input)
Error Messages
[03/Mar/2014:09:12:05] warning (19052): CORE3283: stderr: [Fatal Error] web.html:1:3: The markup in the document preceding the root element must be well-formed.
[03/Mar/2014:09:12:05] warning (19052): CORE3283: stderr: org.xml.sax.SAXParseException: The markup in the document preceding the root element must be well-formed.
My Code
import org.w3c.dom.*;
import javax.xml.parsers.*;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;
public class OutputBuilder
{
private DocumentBuilderFactory docBF;
private DocumentBuilder docBuilder;
private Document doc;
private static float UUID;
private String docType;
public OutputBuilder(String template, String output) throws ParserConfigurationException, SAXException, IOException
{
docBF = DocumentBuilderFactory.newInstance();
docBuilder = docBF.newDocumentBuilder();
//set the base document to the specified template file
doc = docBuilder.parse(new File(template));
//
docType = output;
}
/*
* Build the final document by adding values passed in from query results
*/
public void fillTemplate(ResultSet qR) throws SQLException
{
if(docType.equals("html"))
{
//find the designated point of data insertion to the html document
Element appendPoint = doc.getElementById("archive_table");
//get meta data column names for table header row
ResultSetMetaData rsmd = qR.getMetaData();
//generate this first row which is the header
Element headerRow = doc.createElement("tr");
//create a column in the table header for each column in the query results
for (int i = 0; i < rsmd.getColumnCount(); i++)
{
Element tableH = doc.createElement("th");
tableH.setNodeValue(rsmd.getColumnName(i));
headerRow.appendChild(tableH);
}
//append header row to table
appendPoint.appendChild(headerRow);
//fill table body rows with query results
while(qR.next())
{
//create a table row for each row in query results
Element bodyRow = doc.createElement("tr");
//fill that row with all column values in query results
for(int i = 0; i < rsmd.getColumnCount(); i++)
{
Element tableB = doc.createElement("td");
tableB.setNodeValue(qR.getString(i));
bodyRow.appendChild(tableB);
}
//add each constructed row to the table
appendPoint.appendChild(bodyRow);
}
}
else
{
//do XML construction
}
}
}
What specific libraries or new logic must I do to allow my class to handle both XML and HTML construction?
Any other suggestions welcome!
P.s. upvote for a fine specimen of a question
解決
所属していません StackOverflow