C Snippets for Algebra in Finite Prime Fields

Question 1

The reason for using powers 2^N - m, where is small, is due to the fact, that calculating the modulo of the word of format (HI * 2^N + LO) mod 2^N-m can be split to two (or more pieces) that reduce to

    (HI*2^N+LO) mod (2^N-m) ==
    ((HI*2^N) mod (2^N-m) + LO mod (2^N-m)) mod (2^N-m)
    (m * HI  + LO ) mod (2^N-m).

The value m*HI + LO has at most log2(m) bits more than fits the computer word -- that log2(m) bit value can be again folded back to the sum by repeatedly multiplied with m and accumulated. Typically one iteration is enough.

If m is small, m^2 or m^3 can be reasonably small too -- then one can apply the technique to calculate modulus of a big-num:

    [AAAAA | BBBBB | CCCCC | DDDDD | EEEEE ] mod (2^N-m) ==
     EEEEE * 1 mod (2^N-m) +
     DDDDD * m mod (2^N-m) +
     CCCCC * (m^2) mod (2^N-m) + ... etc.

It's the same in base 10, where

    1234 5678 9812 mod 9997 ==
              9812 mod 9997 +
            3*5678 mod 9997 +
            9*1234 mod 9997 ==
            3 7952 mod 9997 == ( 7952 + 3*3 ) mod 9997 = 7961

    Here 9997 doesn't have to prime, we are using 10^4 instead of 2^N and m = 3

For GF(2^n) calculation there typical speedups are look-up-tables for root^n and log(n); then multiplication is reduced to addition. If the target system wasn't some 16-bit system, I'd propose using SSE4.2 (or Neon) polynomial (carry free) multiplication. If I'm not hugely mistaken, polynomial calculation in GF should be doable with convolution:

for (i=0;i<N*2-1;i++) temp[i]=popcount(A & (bit_reverse(B)<<N)>>i);
//  A = 11010, B=01101, reverse B = 10110
//
//  11010     11010     11010    11010   11010  11010   11010    11010     11010
//      10110    10110    10110   10110  10110 10110  10110   10110    10110
// 0000000000 00010000  0000000  010100  10010 001000 0011000 00010000 000000000
//      0        1         0      0       0      1      0        1        0
// 010001010 to be divided by the generator polynomial using typical CRC approach

Further reading for comparison of GF(2^n) multiplication:

(paper by Serdar S. Erdem, Tuğrul Yanık, Çetin K. Koç,
Acta Applicandae Mathematica September 2006, Volume 93, Issue 1-3, pp 33-55 )

Question 2

Adding to Daniel's answer: Finite fields only can have prime power many elements. You want, however, to have prime p many elements since these are isomorphic to computation mod p (this is fast!). Whereas finite fields with p^r (r > 1) many elements are never isomorphic to Z/p^r Z (i. e. computation mod p^r).

Edit: If you want to implement computation in GF(p^r) you pick an irreducible polynomial p(x) in GF(p)[x] of degree r and do the calculation in GF(p)[x] / (p(x)) i.e. you calculate mod p(x) (so you have to implement polynomial division). You can play around with this stuff in computer algebra systems such as Singular or Macaulay 2