Question

We try to convert from string to Byte[] using the following Java code:

String source = "0123456789";
byte[] byteArray = source.getBytes("UTF-16");

We get a byte array of length 22 bytes, we are not sure where this padding comes from. How do I get an array of length 20?

Was it helpful?

Solution

Alexander's answer explains why it's there, but not how to get rid of it. You simply need to specify the endianness you want in the encoding name:

String source = "0123456789";
byte[] byteArray = source.getBytes("UTF-16LE"); // Or UTF-16BE

OTHER TIPS

May be the first two bytes are the Byte Order Mark. It specifies the order of bytes in each 16-bit word used in the encoding.

Try printing out the bytes in hex to see where the extra 2 bytes are added - are they at the start or end?

I'm picking that you'll find a byte order marker at the start (0xFEFF) - this allows anyone consuming (receiving) the byte array to recognise whether the encoding is little-endian or big-endian.

UTF has a byte order marker at the beginning that tells that this stream is encoded in a particular format. As the other users have pointed out, the
1st byte is 0XFE
2nd byte is 0XFF
the remaining bytes are
0
48
0
49
0
50
0
51
0
52
0
53
0
54
0
55
0
56
0
57

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top