Question

I'm working on a project that I need to code and encode a string in java. my string is an UTF-8 string consist of persian character. I simply want to xor every bytes with a static character and then xor it again with the same static character.

I wrote the below code but it completely works wrong! I check it with English character and it works.

How can I fix this problem?

String str = "س";
char key = 'N';
byte bKey = (byte) key;

byte[] b = str.getBytes();

for (int i = 0; i < b.length; i++)
{
    b[i] = Byte.valueOf((byte) (b[i] ^ bKey));
}

String str1 = new String(b);
b = str1.getBytes();

for (int i = 0; i < b.length; i++)
{
    b[i] = (byte) (b[i] ^ bKey);
}

String str2 = new String(b);
Was it helpful?

Solution

The problem is coming when you create str1 from the mutated bytes. Assuming your default encoding is UTF8, when you say String str1 = new String(b); you're saying here are some bytes in UTF8 encoding, please build a nice string for me. But because you XOR'd the bytes, the encoding is invalid UTF8, and Java doesn't quite know what to do with it. If you look at the bytes that are being retrieved from str1 with b = str1.getBytes(); you'll see they are different than the bytes you created the string with!

Really you shouldn't be creating a string from "nonsense" bytes --- do you really need to store the XOR'd bytes back in a string?

If you really want to do that, you can trick the system by using a single-byte encoding where all the possible byte values are valid. Then you can be sure that the bytes you put into the string will be the same ones you get out. Here's an example that's working for me:

public class B {
    static public void main(String[] args) throws Exception {
        String str = "س";
        System.out.println(str);
        char key = 'N';
        byte bKey = (byte) key;

        byte[] b = str.getBytes("UTF8");

        System.out.println("Original bytes from str:");
        for (int i = 0; i < b.length; i++) {
            System.out.println(b[i]);
        }

        System.out.println("Bytes used to create str1:");
        for (int i = 0; i < b.length; i++) {
            b[i] = Byte.valueOf((byte) (b[i] ^ bKey));
            System.out.println(b[i]);
        }

        String str1 = new String(b, "Cp1256");

        b = str1.getBytes("Cp1256");

        System.out.println("Bytes retrieved from str1:");
        for (int i = 0; i < b.length; i++) {
            System.out.println(b[i]);
            b[i] = (byte) (b[i] ^ bKey);
        }

        System.out.println("Bytes used to create str2:");
        for (int i = 0; i < b.length; i++) {
            System.out.println(b[i]);
        }

        String str2 = new String(b, "UTF8");
        System.out.println(str2);
    }
}

The output I get is:

س
Original bytes from str:
-61
-65
-30
-119
-91
Bytes used to create str1:
-115
-15
-84
-57
-21
Bytes retrieved from str1:
-115
-15
-84
-57
-21
Bytes used to create str2:
-61
-65
-30
-119
-91
س

OTHER TIPS

The problem occurs when you try to create a new String with the XORed bytes:

String str1 = new String(b);
b = str1.getBytes();

Since the XORed bytes do not form valid Unicode/UTF-8 characters, this String is invalid and getBytes() does not return what you think it does.

If you skip translating back into a String, your code will work fine.

Firstly, str.getBytes(); means converting characters to bytes using default charset. And String str1 = new String(b); is using default charset, too. There is nothing related to UTF-8 here.

And doing bit operation in Java is a bit tricky, try to change all b[i] to (b[i] & 0xff).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top