문제

I am using a HTML parser called Jsoup, to load and parse HTML files. The problem is that the webpage I'm scraping is encoded in ISO-8859-1 charset while Android is using UTF-8 encoding(?). This is results in some characters showing up as question marks.

So now I guess I should convert the string to UTF-8 format.

Now I have found this Class called CharsetEncoder in the Android SDK, which I guess could help me. But I can't figure out how to implement it in practice, so I wonder if could get som help with by a practical example.

UPDATE: Code to read data (Jsoup)

url = new URL("http://www.example.com");
Document doc = Jsoup.parse(url, 4000);
도움이 되었습니까?

해결책

You can let Android do the work for you by reading the page into a byte[] and then using the jSoup methods for parsing String objects.

Don't forget to specify the encoding when you create the string from the data read from the server using the correct String constructor.

다른 팁

Byte encodings and Strings

public static void main(String[] args) {

      System.out.println(System.getProperty("file.encoding"));
      String original = new String("A" + "\u00ea" + "\u00f1"
                                 + "\u00fc" + "C");

      System.out.println("original = " + original);
      System.out.println();

      try {
          byte[] utf8Bytes = original.getBytes("UTF8");
          byte[] defaultBytes = original.getBytes();

          String roundTrip = new String(utf8Bytes, "UTF8");
          System.out.println("roundTrip = " + roundTrip);

          System.out.println();
          printBytes(utf8Bytes, "utf8Bytes");
          System.out.println();
          printBytes(defaultBytes, "defaultBytes");
      } catch (UnsupportedEncodingException e) {
          e.printStackTrace();
      }

   } // main
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top