The problem is in this expression:
new String(letters.getBytes('IBM500'))
letters.getBytes creates a byte-array containing (in hexadecimal):
81 82 83 84 C1 C2 C3 C4
but then you're immediately converting that back to a Unicode String using your platform default encoding:
new String( <byte-array> );
If you want the ordinal values of the the characters in your String to be equal to the byte value, you must specify an encoding that does that, for example ISO-8859-1:
new String(letters.getBytes('IBM500'), "ISO-8859-1")
The encoding you're using does not define a character encoding for byte 81
so it is replacing it with ?
(3f
). You're most likely using Windows-1252.
Strings contain characters, not bytes. Java will always apply an encoding conversion when going from one to the other.
EDIT: responding to @mister270's comment:
Here's a program in Java to demonstrate:
public class Ebcdic
{
public static void main(String[] args) throws Exception
{
String letters = "abcdABCD";
byte[] ebcdic = letters.getBytes("IBM500");
System.out.print("Ebcdic bytes:");
for (byte b: ebcdic)
{
System.out.format(" %02X", b & 0xFF);
}
System.out.println();
String lettersEbcdic = new String(ebcdic, "ISO-8859-1");
System.out.print("Ebcdic bytes stored in chars:");
for (char c: lettersEbcdic.toCharArray())
{
System.out.format(" %04X", (int) c);
}
System.out.println();
System.out.println("Ebcdic bytes in chars printed in using my default platform encoding: " + lettersEbcdic);
}
}
Output is:
Ebcdic bytes: 81 82 83 84 C1 C2 C3 C4
Ebcdic bytes stored in chars: 0081 0082 0083 0084 00C1 00C2 00C3 00C4
Ebcdic bytes in chars printed in using my default platform encoding: ????��ǎ
What this shows is that
- the Ebcdic conversion into the byte-array is occurring correctly using "IBM500"
- the "identity" conversion of bytes to chars using "ISO-8859-1" is occurring correctly
- My system doesn't have a mapping to convert Unicode character U+0081 etc to my default platform character encoding so it displays it as
?
Java (so Groovy too) stores characters internally as Unicode. UTF16, to be precise. If you want to encode them as Ebcdic, then they stop being characters and should no longer be held in the Strings. Ebcdic is an 8-bit encoding so each character can be stored in a byte. If you need to interface with a system that expects a particular encoding (in your case, Ebcdic), then that system really should accept bytes, not Strings, otherwise you end up with just these sorts of confusion.
If you must use Strings to hold Ebcdic bytes, then you must use the ISO-8859-1 encoding whenever you use an InputStream or OutputStream (including System.out) to ensure that your ebcdic codes are not "translated" from bytes to characters