I am implementing a custom iostream (i.e., with read, write, seek and close) which uses the RC4 stream cipher for encryption and decryption. One of the contracts of this stream is that it is bidirectional and calling code needs to be able to arbitrarily seek to any position in the stream before doing any actual reading or writing.
Now because RC4 utilizes a key that relies on all previous swap operations up to a given 'tell' position, how can I incorporate an ability to arbitrarily seek to any position?
Obviously I could seek up to the position of the given seek offset (marked by THIS BIT in the following example), before doing the actual xor-ing transformation process, something like,:
/**
* @brief called from a stream's read or write function
* @param in the input buffer
* @param out the output buffer
* @param startPosition the current stream position (obtained via the streams
* tellg or tellp functions for read and write respectively)
* @param length the number of bytes to transform
*/
void transform(char *in, char *out,
std::ios_base::streamoff startPosition,
long length)
{
// need to reset sbox from member s_box each time this
// function is called
long sbox[256];
for (int i = 0; i<256; ++i) {
sbox[i]=m_sbox[i];
}
// ***THIS BIT***
// need to run the swap operation startPosition times
// to get sbox integer sequence in order
int i = 0, j = 0, k = 0;
for (int a=0; a < startPosition; ++a) {
i = (i + 1) % 256;
j = (j + sbox[i]) % 256;
swapints(sbox, i, j);
}
// now do the actual xoring process up to the length
// of how many bytes are being read or written
for (int a=0; a < length; ++a) {
i = (i + 1) % 256;
j = (j + sbox[i]) % 256;
swapints(sbox, i, j);
k = sbox[(sbox[i] + sbox[j]) % 256];
out[a] = in[a] ^ k;
}
}
and then the transform would be called from the read or write of the stream implementation, something like:
MyStream&
MyStream::read(char * const buf, std::streamsize const n)
{
std::ios_base::streamoff start = m_stream.tellg();
std::vector<char> in;
in.resize(n);
(void)m_stream.read(&in.front(), n);
m_byteTransformer->transform(&in.front(), buf, start, n);
return *this;
}
EDIT: the stream should have no knowledge of how the transformation function works. The transformation function is completely independent and I should be able to freely swap in different transformation implementations.
EDIT: the function swapints looks like this:
void swapints(long *array, long ndx1, long ndx2)
{
int temp = array[ndx1];
array[ndx1] = array[ndx2];
array[ndx2] = temp;
}
The real problem with the above transform function is in its slowness because it has to perform startPosition initial swap operations before the xor transformation-proper is performed. This is very problematic when many seek operations are performed. Now I've heard that RC4 is meant to be quick but my (probably bad implementation) suggests otherwise given the initial set of swap operations.
So my real question is: how can the above code be optimized to reduce the number of required operations? Ideally I would like to eliminate the initial ("THIS BIT") set of swap operations
EDIT: optimizing out the initial sbox setting is probably trivial (e.g. using memcpy as suggested by egur). The important optimization I think is how I can optimize out the loop marked by THIS BIT. Perhaps all those swap ints can be programmed more concisely without the need for a for-loop.
Thanks,
Ben