Question

I have an assignment that asks us to implement S-DES (Simplified DES), and the algorithm includes a lot of bit permutations, shifts as well as XOR-ing.

Obviously, the fastest way to implement this would be using bit operations, such as:

char CLS(char key, int shift){
    char skey;
    skey = (key << shift) | (key >> (8 - shift)) 
    return skey;
}

/* Get 8-bit subkey from 10-bit key */
char permute(short int key){
    short int i;
    short int k1[] = { BIT_6, BIT_3, BIT_7, BIT_4, BIT_8, BIT_5, BIT_10, BIT_9 }; // SDES spec
    char sk1 = '\0';

    for(i = 0; i < 8; i++){
        sk1 = (sk1 << 1) | (key & k1[i]);
    }
}

...

Which is simple enough to do. However, how do I get those bits efficiently? Using something like fread(), I could read at least 1 byte at a time and supply those to the SDES algorithm, but that grossly under-utilizes the CPU, as I'd not only be reading and encrypting only 1 byte at a time, I would also be writing 1 byte of encrypted data to disk at a time! Surely there must be a better way.

The only alternative to this that I could think of was treat each byte as a char array and manipulate the bits that way, but not only does that add memory overhead: I'd still be stuck working with 1 byte at a time and I wouldn't be able to use shift operations and need to instead use temporary arrays and array indices.

I'm looking to improve this behaviour slightly, but all I could think of was bringing bigger blocks into memory. For example, I could use fread() to read 4K blocks of data into a char array[4096] and then work with that as opposed to reading 4096 bytes one-by-one from disk.

But, I'm not certain if this is all I could do given that it's a simple algorithm. Could further improvements be made, or is this about as good as it gets?

Should anyone want to take a look, here is the specification of the S-DES algorithm.

Was it helpful?

Solution

If you want to speed-up your algorithm you could parallelize the encryption on the byte array, for example with OpenMP. As DarkSquirrel42 pointed out in a comment, to gain speed you should also replace your permute function with a function using a look-up table:

#define LUT_SIZE 1024
static char lookup_table[LUT_SIZE];

/* Get 8-bit subkey from 10-bit key */
char permute(short int key)
{
    // SDES spec
    short int k1[] = { BIT_6, BIT_3, BIT_7, BIT_4, BIT_8, BIT_5, BIT_10, BIT_9 }; 
    char sk1 = '\0';

    for (short int i = 0; i < 8; i++) {
        sk1 = (sk1 << 1) | (key & k1[i]);
    }
    return sk1;
}

void init_lut()
{
    for (short int i = 0; i < LUT_SIZE; i++) {
        lookup_table[i] = permute(i);
    }
}

char permute_fast(short int key)
{
    if (key < 0 || key >= LUT_SIZE) {
        //error handling
        return 0;
    }
    return lookup_table[key];
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top