Internal character encoding of Java 7

https://stackoverflow.com/questions/13577049

02-12-2021
|

Question

So far as I know, when JRE executes an Java application, the string will be seen as a USC2 byte array internally. In wikipedia, the following content can be found.

Java originally used UCS-2, and added UTF-16 supplementary character support in J2SE 5.0.

With the new release version of Java (Java 7) , what is its internal character-encoding?
Is there any possibility that Java start to use UCS-4 internally ?

Solution

Java 7 still uses UTF-16 internally (Read the last section of the Charset Javadoc), and it's very unlikely that will change to UCS-4. I'll give you two reasons for that:

Changing from UCS-2=>UCS-4 would most likely meant that they would have to change the char primitive from a 16 bits type to a 32 bits type. Looking in the past at how high Sun/Oracle have valued backwards compatibility, a change like this is very unlikely.
A UCS-4 takes a lot more memory than a UTF-16 encoded String for most use cases.

OTHER TIPS

Q: So far as I know, when JRE executes an Java application, the string will be seen as a (16-bit Unicode) byte array

A: Yes

Q: With the new release version of Java (Java 7) , what is its internal charater-encoding?

A: Same

Q: Is there any possibility that Java start to use UCS-4 internally?

A: I haven't heard anything of the kind

However, you can use "code-points" to implement UTF-32 characters in Java 5 and higher:

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow