Question

While trying to use the Bing API to search, I am getting characters that are not printable and do not seem to hold any extra information. The goal is to save the XML (UTF-8) response as a text file to be parsed later.

My code currently looks something like this:

    URL url = new URL(queryURL);

    BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
    BufferedWriter out = new BufferedWriter(new FileWriter(query+"-"+saveResultAs));
    String str = in.readLine();
    out.write(str);

    in.close();
    out.close();

When I send the contents of 'str' to console it looks something like this:

alt text

and here's a what the newly created local XML file looks like:

alt text

What should I be doing to convert the UTF-8 text so that str does not have the extra characters?

Was it helpful?

Solution

If you know upfront the encoding you should

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));

And the same with the writer... in your example after writing your file is encoded in platform default, while still declaring to be UTF-8.

It may be wise to read the encoding from the XML declaration to avoid surprises.

If you only want to store the data for later use there's no use to encode/decode anyway. Just read the bytes and write them away. Keep the task of detecting encoding for the XML parser..

OTHER TIPS

The XML parser will handle encoding/decoding, and the appropriate characters will be fed back to you (e.g. a SAX parser will do this via the characters() method callback). All you need to do is then store that in a suitable file (perhaps with a suitable Byte-Order-Mark?)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top