
While reading an XML file using StAX and XMLStreamReader, I encountered a weird problem. Not sure if its an error or I am doing something wrong. Still learning StAX.

So the problem is,

  1. In XMLStreamConstants.CHARACTERS event, when I collect node text as XMLStreamReader.getText() method.
  2. If there is &, <, > or even something hidden for instance in node text, it returns only the first part of the text string. e.g. ABC & XYZ returns only ABC

Simplified Java Source:

    // Start StaX reader
    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    try {
        XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(inStream);
        int event = xmlStreamReader.getEventType();
        while (true) {
            switch (event) {
                case XMLStreamConstants.START_ELEMENT:
                    switch (xmlStreamReader.getLocalName()) {
                        case "group":
                        // Do something
                        case "source":
                            isSource = true;
                        case "target":
                            isTarget = true;
                            isSource = false;
                            isTrans = false;
                case XMLStreamConstants.CHARACTERS:
                    if (srcData != null) {
                        String srcTrns = xmlStreamReader.getText();
                        if (srcTrns != null) {
                            if (isSource) {
                                // Set source text
                                isSource = false;
                            } else if (isTrans) {
                                // Set target text
                                isTrans = false;
                case XMLStreamConstants.END_ELEMENT:
                    if (xmlStreamReader.getLocalName().equals("group")) {
                        // Add to return list
            if (!xmlStreamReader.hasNext()) {
            event =;
    } catch (XMLStreamException ex) {
        LOG.log(Level.WARNING, ex.getMessage(), MessageFormat.format("{0} {1}", ex.getCause(), ex.getLocation()));

I am not quite sure what exactly I am doing wrong or how to collect complete text of the node.

Any suggestions or tips would be a great help to move on learning StAX more. :-)

Was it helpful?


I have solved the problem after struggling and researching a bit.

It was a problem reading text with escaped entity references. You need to set XMLInputFactory IS_COALESCING to true

XMLInputFactory.setProperty(XMLInputFactory.IS_COALESCING, true);

Basically this tells the parser to replace internal entity references with their respective replacement text (in other words, something like decoding) and read them as normal characters.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top