Question

I need to be able to convert an int into a string which represents a series of bytes, and back. To do this, I came up with this code: Int -> Byte[] -> String

new String(ByteBuffer.allocate(5).putInt(num).array())

String -> Byte[] -> Int

ByteBuffer.allocate(4).put(team.getBytes()).getInt(0)

One of my test cases is the number 4231. When viewed as a string, none of the characters are visible but that's not completely unusual, and when I invoke it's .length() method, it returns 4. But when I used .getBytes(), I get [0, 0, 16, -17, -65, -67], which causes a StackOverflowException. Can someone explain this result to me?

Was it helpful?

Solution

Without knowing the platform default encoding of your machine, it's slightly hard to say - and you should avoid calling String.getBytes without specifying an encoding, IMO.

However, basically a String represents a sequence of characters, encoded as a sequence of UTF-16 code units. Not every character is representable in one byte, in many encodings - and you certainly shouldn't assume it is. (You shouldn't even assume there's one character per char, due surrogate pairs used to represent non-BMP characters.)

Fundamentally, you shouldn't treat a string like this - if you want to encode non-text data in a string, use hex or base64 to encode the binary data, and then decode it appropriately. Otherwise you can easily get invalid strings, and lose data - and more importantly, you're simply not treating the type for the purpose it was designed.

When you convert a byte[] into a String, you're saying "This is the binary representation of some text, in a particular encoding" (either explicitly or using the platform default). That's simply not the case here - there's no text to start with, just a number... the binary data isn't encoded text, it's an encoded integer.

OTHER TIPS

First, the integer was convert to 4 bytes, so the bytes are [ 0, 0, 16, -17 ]. First, let's convert 4231 to hex. We get: 000010E1. Converting to decimal, the zeroes are obviously zero. The 10 has a 1 in the 16's place, so it's 16.

So the only real mystery is where the -17 came from. The answer is that if you take the 8-bit representation of E1(hex) and add the 8 bit representation of 17(decimal) to it, you get zero (with a carry to nowhere). Therefore E1(hex) is the 8-bit representation of -17 decimal.

If this kind of stuff isn't obvious to you, you probably shouldn't mess with native encodings and should instead separate and combine the numbers yourself using things like multiplication and division. (Use just use decimal numbers and strings.)

What you are trying is viewing bytes as characters. That concept became invalid with the introduction of multi-byte characters in operating systems and languages.

In java Strings are composed of characters, not bytes. A mistake often made is that a conversion from byte[] -> String -> byte[] using the getBytes()/new String(byte[]) will yield the original bytes. Thats simply not true, depending on the encoding, byte[] -> String may already lose information (if the byte[] contains values invalid for that encoding). Likewise, not every encoding can encode every possible character.

So you are chaining two possibly lossy operations and wonder why information is lost.

Proper way to encode the information contained in the int is to select a specific representation for the int (e.g. decimal or hexadecimal) and encode/decode that.

Try this for encoding/decoding:

String hex = Integer.toString(i, 16);
int decoded = Integer.parseInt(hex, 16);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top