Question

I am making a chatting application through smack api. When I send message which include this character ', the output comes as

message== ma'am

output==

ma

'

am

here is the code

  StringEscapeUtils.unescapeHtml((new String(ch, start, length).replace("'", "`").replace("'", "'")));

here is the code

DefaultHandler handler = new DefaultHandler() {
                @Override
                public void startDocument() throws SAXException {
                }

                @Override
                public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {                        
                        for (int i = 0; i < attributes.getLength(); i++) {
                            if (attributes.getLocalName(i).equalsIgnoreCase("from")) {
                                from = attributes.getValue(i);
                                break;
                            }
                        }                        
                   ....
                }

                @Override
                public void characters(char ch[], int start, int length) throws SAXException {
                    String str = StringEscapeUtils.unescapeHtml((new String(ch, start, length)));                    
                    switch (elementType) {
                        case 1:
                            msg = str;
                            break;
                     ...
                        default:
                           ...
                            break;
                    }
//                  

                @Override
                public void endElement(String uri, String localName, String qName) throws SAXException {
                }

                @Override
                public void endDocument() throws SAXException {
                }
Was it helpful?

Solution

Very often, XML parsers will break text elements into multiple character nodes. This is perfectly valid from an XML point of view. So you will need to handle this appropriately. So maybe the problem arises from printing, not the unescaping.

E.g. I can imagine the following XML

<n>A &amp; B</n>

producing the following events:

  1. begin node n
  2. text node "A"
  3. text node "&amp;"
  4. text node "B"
  5. end node n

Now if you println every character "thing" you see, you'll get three lines instead of one. Maybe your parser has an option to enforce "normalizing" the events to join succssive text nodes.

(Sorry if I'm not using all the appropriate XML teminiology. My XML terminilogy has become a bit rusty, so feel free to edit this question and put in the correct XML terms. Thank you)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top