Question

I have been using SimpleXML for a while now to serialize my java objects, but I am still learning and run into trouble sometimes. I have the following XML that I want to deserialize:

<messages>
<message>
    <text>
       A communications error has occurred. Please try again, or contact  <a href="someURL">administrator</a>. Alternatively, please <a href = "someURL' />">register</a>. 
    </text>       
</message>

I would like process it such that the contents of the element are treated as a single string and the anchor tags to be ignored. I have no control on how this XML is generated - it is, as you can see, an error message from some server. How do I achieve this? Many thanks in advance.

Was it helpful?

Solution

You might want to try escaping the text by importing:

import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;

And using it as:

a.setWordCloudStringToDisplay(escapeHtml(wordcloud));

OTHER TIPS

To read text and Element is not offered basically by Simple XML. You have to use Converter. You can read https://stackoverflow.com/questions/17462970/simpleframwork-xml-element-with-inner-text-and-child-elements that answer quite the same problem except that it read only one text.

Here is a solution to get multiples text and href in a single string.

First, I create a A class for the 'a' tag, with a toString methode to print the tag as it is in xml :

@Root(name = "a")
public class A {
    @Attribute(required = false)
    private String href;
    @Text
    private String value;

    @Override
    public String toString(){
        return "<a href = \"" + href + "\">" + value + "</a>";
    }
}

Then the Text class to read the 'text', where the convert is necessary :

@Root(name = "Text")
@Convert(Text.Parsing.class)
public class Text {

    @Element
    public String value;

    private static class Parsing implements Converter<Text> {
        // to read <a href...>
        private final Serializer ser = new Persister();

        @Override
        public Text read(InputNode node) throws Exception {
            Text t = new Text();
            String s;
            InputNode aref;

            // read the begining of text (until first xml tag)
            s = node.getValue();
            if (s != null) { t.value = s; }
            // read first tag (return null if no more tag in the Text)
            aref = node.getNext();
            while (aref != null) {
                // add to the value using toString() of A class
                t.value = t.value + ser.read(A.class, aref);
                // read the next part of text (after the xml tag, until the next tag)
                s = node.getValue();
                // add to the value
                if (s != null) { t.value = t.value + s; }
                // read the next tag and loop
                aref = node.getNext();
            }
            return t;
        }

        @Override
        public void write(OutputNode node, Text value) throws Exception {
            throw new UnsupportedOperationException("Not supported yet.");
        }
    }
}

Note that I read the 'a' tag with a standard serializer, and add in the A class a toString methode to get it back as an xml string. I have not found a way to read directly the 'a' tag as text.

And the main class (don't forget the AnnotationStrategy which map the Convert method to the deserialisation of the text element) :

public class parseText {
  public static void main(String[] args) throws Exception {
    Serializer serializer = new Persister(new AnnotationStrategy());
    InputStream in = ClassLoader.getSystemResourceAsStream("file.xml");
    Text t = serializer.read(Text.class, in, false);

    System.out.println("Texte : " + t.value);
  }
}

When I use it with the following xml file :

<text>
    A communications error has occurred. Please try again, or contact <a href="someURL">administrator</a>.
    Alternatively, please <a href = "someURL' />">register</a>. 
</text>

It give the following result :

Texte : 
   A communications error has occurred. Please try again, or contact <a href = "someURL">administrator</a>.
   Alternatively, please <a href = "someURL' />">register</a>. 

I hope this will help you to solve your problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top