How do I format and read XML processing instructions using Java StAX?

https://stackoverflow.com/questions/404141

03-07-2019
|

Question

First, how do I format the XML processing instruction, is it:

<?processingInstructionName attribute="value" attribute2="value2"?>

Using StAX, I then want to read it by handling the XMLStreamConstants.PROCESSING_INSTRUCTION (javadoc) event, but it only provides two methods to then retrieve information about the processing instruction from the XMLStreamReader:

getPITarget()
getPIData()

The javadoc for these two methods isn't very helpful.

Is the XML formatting correct?
Is this the proper way to go about parsing processing instructions using the StAX XMLStreamReader APIs?
How do I use getPITarget() and getPIData() to return multiple arguments?

Solution

1.Is the XML formatting correct?

Yes, however do note that a processing instruction does not have attributes -- only data. What looks like attributes are part of the data and some people call them "pseudo-attributes".

2.Is this the proper way to go about parsing processing instructions using the StAX XMLStreamReader APIs?

Yes.

3.How do I use getPITarget() and getPIData() to return multiple arguments?

If by "multiple arguments" you mean the possibly more than one pseudo-attributes contained in the data, the answer is that your code must parse the data (using some standard string methods as the C# split(), and retrieve the set of name-value pairs for all pseudo-attributes.

OTHER TIPS

I think that this notion of processing instructions having attributes comes from some old xml manuals. At one point there was discussion of recommending PIs to honor or require such structuring. However, the official xml specification has never mandated or even recommended such usage.

So basically you do have to parse contents yourself -- they may be in any format, but if you do know that it uses attribute notation you can parse it.

As far as I know, none of Java xml parsers or processing packages support such usage, unfortunately.

Although Dimitre's answer is technically correct, a few popular libraries now parse processing instruction pseudo-attributes as would be expected. The subsequent examples parse the following XML processing instruction to obtain the value for the href pseduo-attribute:

<?xml-stylesheet type="text/xsl" href="markdown.xsl"?>

JDOM2

Using JDOM2:

import org.jdom2.ProcessingInstruction;
import org.xml.sax.helpers.DefaultHandler;

public class ProcessingInstructionHandler extends DefaultHandler {

  @Override
  public void processingInstruction( final String target, final String data ) {
    final ProcessingInstruction pi = new ProcessingInstruction( target, data );
    System.out.println( pi.getPseudoAttributeValue( "href" ) );
  }
}

Saxon

Using Saxon:

import static net.sf.saxon.tree.util.ProcInstParser.getPseudoAttribute;
import org.xml.sax.helpers.DefaultHandler;

public class ProcessingInstructionHandler extends DefaultHandler {

  @Override
  public void processingInstruction( final String target, final String data ) {
    System.out.println( getPseudoAttribute( data, "href" ) );
  }
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow