Writing RC4 for a 16 bit system

https://stackoverflow.com/questions/10147191

31-05-2021
|

Question

I am writing RC4 for the DCPU-16, however I have some questions before I begin.

RC4 algorithm:

//KSA
for i from 0 to 255
    S[i] := i
endfor
j := 0
for i from 0 to 255
    j := (j + S[i] + key[i mod keylength]) mod 256
    swap values of S[i] and S[j]
endfor

//PRGA
i := 0
j := 0
while GeneratingOutput:
    i := (i + 1) mod 256
    j := (j + S[i]) mod 256
    swap values of S[i] and S[j]
    K := S[(S[i] + S[j]) mod 256]
    output K
endwhile

As I am working with 16-bit words so each element of S[] can go from a range from 0-65535, instead of the expected 0-255. And K needs to be 0-65535, what would be the best approach to deal with this problem?

The options I see (and their problems) are:

Still use Mod 255 everywhere and populate the output with two rounds concatenated (will take longer to run and I want to keep my CPB as low as possible)
Tweak RC4 so K will be a 16 bit number while still using an array of length 255 for S[] (I want to do the crypto right so I am concerned about making mistakes tinkering with RC4.)

What is my best option? I feel that I may have to do #1, but I am hoping people here can instill confidence for me to do #3.

La solution

option 2 will make the encryption weaker

you can do

loop: add i,1 ;2 cycles
and i,0xff ;-- &0xff is the same as %256 ;2 cycles
add j,[i+arr];3 cycles
and j,0xff;3 cycles
set o,[j+arr];-- using overflow reg as swap var;2 cycles
set [j+arr],[i+arr];3 cycles
set [i+arr],o;2 cycles
set a,[i+arr];-- calc index;2 cycles
add a,[j+arr];3 cycles
and a,0xff;3 cycles
set b,[a+arr];2 cycles

;-- second octet
add i,1
and i,0xff
add j,[i+arr]
and j,0xff
set o,[j+arr] 
set [j+arr],[i+arr]
set [i+arr],o
set a,[i+arr]
add a,[j+arr]
and a,0xff
shl b,8
bor b,[a+arr]
;--output b
set pc,loop

this is about as fast as you can make it (57 cycles per 16 bit word unless I missed something) this assumes that S is static (the arr value in my code) and i and j are store in the registers (you can store them before/after S when you are outside of the code)

trying to pack the array will make everything slower as you need to unpack it each time

Autres conseils

I don't see the problem, as the DCPU16 has 16-bit words. RC4 operates in mod 256 both in the key scheduling as well as the PRGA (its output is a stream of bytes - again, no issues). If your issue is saving space, you can use a single word to store two adjacent cells of S, but that's about it.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow