Question

I have implemented a SAX parser in Java by extending the default handler. The XML has a ñ in its content. When it hits this character it breaks. I print out the char array in the character method and it simply ends with the character before the ñ. The parser seems to stop after this as no other methods are called even though there is still much more content. ie the endElement method is never called again. Has anyone run into this problem before or have any suggestion on how to deal with it?

Was it helpful?

Solution

What's the encoding on the file? Make sure the file's encoding decloration matches it. Your parser may be defaulting to ascii or ISO-8859-1. You can set the encoding like so

<?xml version="1.0" encoding="UTF-8"?>

UTF-8 will cover that character, just make sure that's what the file actually is in.

OTHER TIPS

If you are saving your XMLs in ASCII, you can only use the lower half (first 128 characters) of the 8-bit character table. To include accented, or other non-english characters in your XML, you will either have to save your XML in UTF-8 or escape your charaters like &#241; for ñ.

I faced this issue. XML stream you are feeding must me read as ascii, encode ascii to 'UTF-8' within code or change it to character stream and all will be fine.

something like this will help you:

File F = new File(C://Location);
BuffeReader Readfile = new BufferReader(F);
InputSource Encode = new InputSource(Readfile);
Encode.setEncoding("UTF-8");
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top