Java's String.getByte() vs C#

https://stackoverflow.com/questions/23610130

20-07-2023
|

Question

I have some code in Java that use

String.getBytes()

(without encoding parameters) on some generated string to obtain byte[] which I later use as a key for AES encryption.

Then I take the encoded message and under C# (WP7/WP8 environment) I need to decode it. I can easily generate the string that I've used in Java application, however I need to convert this to byte[] in such way that it will generate exactly the same byte array as in Java.

Question 1: Can I do this without altering Java code?

Question 2: If not, how should I implement both version so they will always return the same byte[] no matter what?

Solution

Basically you should specify an encoding in the Java code. Currently, your code will produce different outputs on different systems as it uses the platform-default encoding (e.g. Windows-1252 or UTF-8).

I would encourage you to use UTF-8 in both cases:

// Java 7 onwards
byte[] bytes = text.getBytes(StandardCharsets.UTF_8);

// Java pre-7
byte[] bytes = text.getBytes("UTF-8");

// .NET
byte[] bytes = Encoding.UTF8.GetBytes(text);

Using UTF-8 allows for all valid Unicode strings to be encoded into bytes. You could consider using UTF-16, but then you need to make sure you specify the same endianness in each case. That does have the benefit of having exactly two bytes per char regardless of content though (as a char is a UTF-16 code unit in both Java and .NET).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow