Question

I am trying to read an UTF8 text file and then make a text comparison with equals() that should return true. But it does not, as getBytes() returns differnt values.

This is a minimal example:

public static void main(String[] args) throws Exception {
  System.out.println(Charset.defaultCharset()); // UTF-8
  InputStream is = new FileInputStream("./myUTF8File.txt");
  BufferedReader in = new BufferedReader(new InputStreamReader(is, "UTF8"));
  String line;
  while ((line = in.readLine()) != null) {
    System.out.print(line); // mouseover
    byte[] bytes = line.getBytes(); // [-17, -69, -65, 109, 111, 117, 115, 101, 111, 118, 101, 114]
    String str = "mouseover";
    byte[] bytesStr = str.getBytes(); // [109, 111, 117, 115, 101, 111, 118, 101, 114]
    if (line.equals(str)) { // false
      System.out.println("equal");
    }
  }
}

I would expect that the String is convertet to UTF-16 at line.readLine() and that equals returns true. Cannot figure out why.

Was it helpful?

Solution

The beginning bytes of the file:

-17, -69, -65

is the bytes of the BOM: Byte Order Mark... Some correlation of your data:

[-17, -69, -65, 109, 111, 117, 115, 101, 111, 118, 101, 114]
               [109, 111, 117, 115, 101, 111, 118, 101, 114]

Also, the proper name of the charset is "UTF-8" -- note the dash

BufferedReader in = new BufferedReader(new InputStreamReader(is, "UTF-8"));
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top