Question

Input (fullInput)

Imagine I have the following as an InputStream (or as a String in memory read from that stream):

<?xml version="1.0" ?>
<root>
    <element attr="val1"><x /><y /></element>
    <element attr="val2"><y /></element>
    <element attr="val3"><x /><x /></element>
    <element attr="val4"><z /><y /></element>
</root>

How I want to use the solution (bridgeXml)

IProprietaryUnmarshaller UNMARSHALLER = ...;
List<Element> parseFullXml(String fullInput) throws UnmarshallException {
    List<String> inputs = bridgeXml(fullInput);
    List<Element> outputs = new ArrayList();
    for(String input : inputs) {
         Element e = UNMARSHALLER.unmarshall(input);
         outputs.add(e);
    }
    return outputs;
}

What I'm looking for

I'm looking for an implementation of/idea for bridgeXml where the input String/*Stream is split up into smaller chucks of Strings which are well-formed XML documents (without XML declaration) by themselves.

The trivial implementation I want to avoid

The below implementation is error prone, inflexible and should not be used, I'm looking for a proper one using some kind of a library or XML parser!

List<String> bridgeXml(String input) {
    // strip anything up to the opening root element, and LTrim the remainder
    input = input.replaceAll("(?s)^.*<root.*?>\\s*", "");
    // strip anything after the closing root element, and RTrim the remainder
    input = input.replaceAll("(?s)\\s*</root.*$", "");
    // split at </element> closing tags, not removing them (?<= does the magic)
    return Arrays.asList(input.split("(?<=</element>)"));
}

Restrictions

  • The XML input can't be changed and is a fully valid XML.
  • The proprietary unmarshaller must be used and cannot be modified.
  • I'm looking for a solution where the file is not XML unmarshalled, XML marshalled, proprietary unmarshalled.
  • (Don't pick at XML/Java code style, formatting, visibility modifiers, etc.!
    These are simplified codes for easier communication.)

Solution (edit)

I ended up writing this piece... I ended up double-parsing the XML (see getOuterXml), because it was premature to assume that it's slow. I have a huge DB query following this which is way slower.

protected <T> List<T> read(InputStream inputStream, String tagName) throws XMLStreamException,
    TransformerException, DecodingException
{
    List<T> result = new ArrayList<T>();
    XMLInputFactory xmlFactory = XMLInputFactory.newInstance();
    XMLStreamReader xmlReader = xmlFactory.createXMLStreamReader(inputStream, "ISO-8859-1");
    while (xmlReader.hasNext()) {
        xmlReader.next();
        if (xmlReader.isStartElement() && tagName.equals(xmlReader.getLocalName())) {
            String output = getOuterXml(xmlReader);
            @SuppressWarnings("unchecked")
            T object = (T) UNMARSHALLER.unmarshall(output);
            result.add(object);
        }
    }
    return result;
}

protected String getOuterXml(XMLStreamReader xmlr) throws TransformerException
{
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    StringWriter stringWriter = new StringWriter();
    transformer.transform(new StAXSource(xmlr), new StreamResult(stringWriter));
    return stringWriter.toString();
}

protected <T> List<T> getObjects(String urlString, String tagName)
{
    LOG.info("Downloading [{}] updates from [{}].", tagName, urlString);
    HttpURLConnection conn = null;
    InputStream inputStream = null;
    try {
        URL url = new URL(urlString);
        conn = (HttpURLConnection) url.openConnection();
        conn.connect();
        inputStream = conn.getInputStream();
        return read(inputStream, tagName);
    } catch (Exception ex) {
        String exceptionMessage = "Updating [" + tagName + "] from [" + urlString + "] failed.";
        LOG.error(exceptionMessage, ex);
        throw new MyFancyWrapperException(exceptionMessage, ex);
    } finally {
        if (inputStream != null) {
            try {
                inputStream.close();
            } catch (IOException ex) {
                LOG.warn("Cannot close HTTP's input stream", ex);
            }
        }
        if (conn != null) {
            conn.disconnect();
        }
    }
}
Was it helpful?

Solution

So here is a litte stax parser for an example xml:

String xml = "<root><element>test</element></root>";
XMLInputFactory xmlif = XMLInputFactory.newInstance();
XMLStreamReader xmlr = xmlif.createXMLStreamReader(new StringReader(xml));
while (xmlr.hasNext()) {
    xmlr.next();
    if (xmlr.isStartElement() || xmlr.isEndElement()) {
        System.out.println(xmlr.getLocalName() + " " + xmlr.getEventType());
    }
}

and here you will find an explanation, how you could combine stax with jaxb.

http://blog.bdoughan.com/2012/08/handle-middle-of-xml-document-with-jaxb.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top