Why is my implementation of Simplified DES working fine under Cp1252 encoding but not under UTF-8?

StackOverflow https://stackoverflow.com/questions/14173393

  •  13-01-2022
  •  | 
  •  

Domanda

I asked the following question yesterday but it didn't get much attention due to the fact that I didn't really include any details about my actual problem.

Eclipse:Using UTF-8 encoding in the text editor make the Strings not work properly, how can I fix that?

I will try to analyze my problem as much as possible in order to give you a clear insight on what's going on.

I have a university project where I am supposed to implement the Simplified DES algorithm for educational purposes. This algorithm is an encryption algorithm which uses a 10 bit key in order to encrypt 8 bit data.

In the implementation I wanted to include encrypting any String.

So I wrote the code for the encryption of 8 bits and it worked perfectly fine for all kinds of inputs. In order to include String encryption support I used the function String.getBytes(), saved all the bytes of the String inside a variable byte[] data

and then I followed this logic:

int i;
for(i=0; i< data.length; i++)
    data[i] = encrypt(data[i]);

and for decryption I followed this logic:

int i;
for(i=0; i< data.length; i++)
    data[i] = encrypt(data[i]);

Here is the actual code in the main function

public static void main(String[] args){

    short K = (short) Integer.parseInt("1010010001",2);
    SDEncryption sdes = new SDEncryption(K); //K is the 10 bit key

    String test = "INFO BOB 57674";

    //let's encrypt the String test
    String enc = sdes.encrypt(test.getBytes());

    //let's decrypt the encrypted String of the initial String
    String dec = sdes.decrypt(enc.getBytes());
}

By using the default encoding which is Cp1252. I tried to encrypt the String and got the following results:

Initial Text: INFO BOB 57674
Encrypted Text: ÅO [áa[aá»j×jt
Decrypted Text: INFO BOB 57674

In order to see the actual bit representation each time I encrypt and decrypt the data I created the following function in order to display all the data of each String:

public void show(byte[] data){
    //εμφάνιση των 
    //note how the Greek letters aren't displayed at all under Cp1252

    int i;
    for(i=0;i<data.length;i++){

        short mask = (short) (1<<7); //10000000
        while(mask>0){
            if((data[i]&mask) == 0)
                System.out.print("0");
            else
                System.out.print("1");

            mask = (short) (mask >> 1);
        }
        if(i < data.length - 1){

            System.out.print(" ");
        }
    }
    System.out.println();
}

So I got the following results:

Initial Text(binary): 01001001 01001110 01000110 01001111 00100000 01000010 01001111 01000010 00100000 00110101 00110111 00110110 00110111 00110100
Encrypted Text(binary): 11000101 01001111 00100000 01011011 11100001 01100001 01011011 01100001 11100001 10111011 01101010 11010111 01101010 01110100
Decrypted Text(binary): 01001001 01001110 01000110 01001111 00100000 01000010 01001111 01000010 00100000 00110101 00110111 00110110 00110111 00110100

Seems like everything is working as expected. In order to support Greek letters in the code editor though, I had to change the encoding to be UTF-8.

After running everything again, I got the following results:

Initial Text: INFO BOB 57674
Encrypted Text: �O [�a[a�j�jt
Decrypted Text: ���NFO���BOB���7���74

Notice how some words of the decrypted text are displayed correctly, for example NFO and BOB. It seems to me as if there's some kind of problems with the bit manipulation, as if Eclipse doesn't recognize a sequence of bits which follows the rules of UTF-8.

Here are the results in binary form:

Initial Text(binary): 01001001 01001110 01000110 01001111 00100000 01000010 01001111 01000010 00100000 00110101 00110111 00110110 00110111 00110100
Encrypted Text(binary): 11101111 10111111 10111101 01001111 00100000 01011011 11101111 10111111 10111101 01100001 01011011 01100001 11101111 10111111 10111101 01101010 11101111 10111111 10111101 01101010 01110100
Decrypted Text(binary): 11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 01001110 01000110 01001111 11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 01000010 01001111 01000010 11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 00110111 11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 00110111 00110100

Now I can see the problem ore clearly. It seems like UTF-8 adds more bytes to the String. However I'm not sure why. I mean the Initial Text seems to have the same amount of bytes so why do these bytes get added after the encryption and even more are added after the decryption?

I would appreciate for any help provided. Thank you in advance!

È stato utile?

Soluzione

Every time you do String.getBytes(), you implicitly use your platform default encoding to transform chars to bytes. If the String contains characters that can't be represented using your platform's default encoding, you lose information. So use an explicit encoding supporting every character on earth, like UTF8: string.getBytes("UTF8").

Similarly, when you do new String(bytes), you use your platform's default encoding to transform the bytes into chars. If the bytes actually are text encoded using another encoding, or aren't chars at all, but purely binary information, you'll also lose information.

Encryption is a binary operation. It takes bytes and returns other bytes. You can't blindly transform bytes into chars, whatever the encoding is, because not all bytes represent a valid character. If you want to transform binary information (like encrypted text) to a String, use Hex or Base64 encoding.

So the encryption process should be:

String clearText = ...:
byte[] clearTextAsBytes = clearText.getBytes("UTF8");
byte[] encryptedBinary = encrypt(clearTextAsBytes);
String encryptedBinaryAsPrintableChars = toBase64(encryptedBinary);

And the decryption process should be symmetric:

String encryptedBinaryAsPrintableChars = ...;
byte[] encryptedBinary  = fromBase64(encryptedBinaryAsPrintableChars);
byte[] decryptedTextAsBytes = decrypt(encryptedBinary);
String decryptedText = new String(decryptedTextAsBytes, "UTF8");
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top