Converting characters to bytes and vice versa is done using a character encoding.
The character encoding determines how characters are represented by bytes. For example, ASCII is a character encoding which uses 7 bits per character. Obviously, it can only represent 128 characters, way less than the 65,536 characters that exist in Java.
Other character encodings are UTF-8 and UTF-16. In fact, a Java char
is really an UTF-16 character - if you directly cast it to an int
, you would get the UTF-16 code for the character.
Here's a longer tutorial to character encodings: What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.
If you call getBytes()
on a String
, it will use the default character encoding of the system to convert the characters in the string to bytes. It's better to use the version of getBytes()
that takes a character set name as an argument, so that you know what character set is used. For example:
byte[] bytes = str.getBytes("UTF-8");