Question

We have a process which communicates with an external via MQ. The external system runs on a mainframe maching (IBM z/OS), while we run our process on a CentOS Linux platform. So far we never had any issues.

Recently we started receiving messages from them with non-printable EBCDIC characters embedded in the message. They use the characters as a compressed ID, 8 bytes long. When we receive it, it arrives on our queue encoded in UTF (CCSID 1208).

They need to original 8 bytes back in order to identify our response messages. I'm trying to find a solution in Java to convert the ID back from UTF to EBCDIC before sending the response.

I've been playing around with the JTOpen library, using the AS400Text class to do the conversion. Also, the counterparty has sent us a snapshot of the ID in bytes. However, when I compare the bytes after conversion, they are different from the original message.

Has anyone ever encountered this issue? Maybe I'm using the wrong code page?

Thanks for any input you may have.

Bytes from counterparty(Positions [5,14]):

00000   F0 40 D9 F0 F3 F0 CB 56--EF 80 04 C9 10 2E C4 D4  |0 R030.....I..DM|

Program output:

UTF String: [R030ôîÕ؜IDMDHP1027W 0510]
EBCDIC String: [R030ôîÃÃÂIDMDHP1027W 0510]
NATIVE CHARSET - HEX:     [52303330C3B4C3AEC395C398C29C491006444D44485031303237572030353130] 
CP500 CHARSET  - HEX:     [D9F0F3F066BE66AF663F663F623FC9102EC4D4C4C8D7F1F0F2F7E640F0F5F1F0] 

Here is some sample code:

private void readAndPrint(MQMessage mqMessage) throws IOException {
    mqMessage.seek(150);
    byte[] subStringBytes = new byte[32];
    mqMessage.readFully(subStringBytes);

    String msgId = toHexString(mqMessage.messageId).toUpperCase();

    System.out.println("----------------------------------------------------------------");
    System.out.println("MESSAGE_ID: " + msgId);

    String hexString = toHexString(subStringBytes).toUpperCase();
    String subStr = new String(subStringBytes);
    System.out.println("NATIVE CHARSET - HEX:     [" + hexString + "] [" + subStr + "]");

    // Transform to EBCDIC
    int codePageNumber = 37;
    String codePage = "CP037";

    AS400Text converter = new AS400Text(subStr.length(), codePageNumber);
    byte[] bytesData = converter.toBytes(subStr);
    String resultedEbcdicText = new String(bytesData, codePage);

    String hexStringEbcdic = toHexString(bytesData).toUpperCase();
    System.out.println("CP500 CHARSET  - HEX:     [" + hexStringEbcdic + "] [" + resultedEbcdicText + "]");

    System.out.println("----------------------------------------------------------------");
}
Was it helpful?

Solution

If a MQ message has varying sub-message fields that require different encodings, then that's how you should handle those messages, i.e., as separate message pieces.

But as you describe this, the entire message needs to be received without conversion. The first eight bytes need to be extracted and held separately. The remainder of the message can then have its encoding converted (unless other sub-fields also need to be extracted as binary, unconverted bytes).

For any return message, the opposite conversion must be done. The text portion of the message can be converted, and then that sub-string can have the original eight bytes prepended to it. The newly reconstructed message then can be sent back through the queue, again without automatic conversion.

Your partner on the other end is not using the messaging product correctly. (Of course, you probably shouldn't say that out loud.) There should be no part of such a message that cannot automatically survive intact across both directions. Instead of an 8-byte binary field, it should be represented as something more like a 16-byte hex representation of the 8-byte value for one example method. In hex, there'd be no conversion problem either way across the route.

OTHER TIPS

It seems to me that the special 8 bytes are not actually EBCDIC character but simply 8 bytes of data. If it is in such case then I believe, as mentioned by another answer, that you should handle that 8 bytes separately without allowing it convert to UTF8 and then back to EBCDIC for further processing.

Depending on the EBCDIC variant you are using, it is quite possible that a byte in EBCDIC is not converting to a meaningful UTF-8 character, and hence, you will fail to get the original byte by converting the UTF8 character to EBCDIC you received.

A brief search on Google give me several EBCDIC tables (e.g. http://www.simotime.com/asc2ebc1.htm#AscEbcTables). You can see there are lots of values in EBCDIC that do not have character assigned. Hence, when they are converted to UTF8, you may not assume each of them will convert to a distinct character in Unicode. Therefore your proposed way of processing is going to be very dangerous and error-prone.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top