Question

I am using STAX parser to process every text node in a xhtml. The application is deployed in Unix box. The parsing operation takes more time the very first instance it is executed. When i run the second time it takes relatively lesser time and in the subsequent calls it takes much more lesser time than the second run with almost consistent results thereafter. Below is the code I am using. Not sure why there is inconsistency in the time taken for parsing the same input. Please help.

One time creation of XmlInputFactory, (static method in the class level)

    static {
    if (xmlInputFactory == null) {
    xmlInputFactory = XMLInputFactory.newInstance();
    xmlInputFactory.setProperty(javax.xml.stream.XMLInputFactory.IS_NAMESPACE_AWARE, false);
    }
    }

The parsing code which performs inconsistently giving different response times for the same input file,

    private static void parse(String xhtmlInput) throws XMLStreamException {
    ByteArrayInputStream arrayInputStream = new ByteArrayInputStream      (xhtmlInput.getBytes(Charset.forName("UTF-8")));
    XMLStreamReader parser = xmlInputFactory.createXMLStreamReader(arrayInputStream);
    while (true) {
    int currentEvent = parser.next();
    if (currentEvent == XMLStreamConstants.CHARACTERS) {
        // Do operation
    } else if (currentEvent == XMLStreamConstants.END_DOCUMENT) {
    parser.close();
    break;
    }
    }
    }
Was it helpful?

Solution

Without knowing which implementation it is, this is bit of speculation, but there are 2 common reasons why any Java library or application runs faster after a while:

  • JVM itself does just-in-time compilation of bytecode, optimizing it on the fly. This is called JIT warmup, and happens quite quickly (over first 10 seconds or so)
  • When reading files, underlying operating system usually caches disk blocks being read -- if you read same content again, it is not read from disk but from in-memory disk buffers.

These are also most common reasons why naive java testing benchmarks give useless results: if you do not account for both (i.e. warm up tests for a while, discard initial results; and read test data from memory, not disk), results are meaningless.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top