Question

I have this C-code to do multiplications over GF(8):

int32_t GaloisMultiply (int32_t a, int32_t b) 
{
    int32_t i;
    int32_t mask = 0x100;
    int32_t y = 0;

    for(i=0;i<8;i++) 
    {
        if(b & mask) 
        {
            y ^= a;
        }
        mask >>= 1;
        y <<= 1;
    }

    if(b & 0x1) 
    {
        y ^= a;
    }

    return(y);
}

That's more or less the text-book implementation.

I wonder if I there is a clever optimization for above algorithm if I can assert that a is always b, e.g. I do squaring instead of multiplication. I'm not after a cryptographic use btw. I just want to make use of the fact that x*x in GF(8) interleaves the bits of x with zero bits one by one.

There are already quite clever methods to do the bit interleaving, but since I've found out that x*x in GF(8) does the bit interleaving thing (by accident) I can't stop trying to use it for bit-interleaving optimizations.

Any ideas?

Was it helpful?

Solution

Table-based? link

And when you are limited to x*x, it's a sparse matrix.

Here's another good paper (and a library)

OTHER TIPS

int32_t GaloisMultiply( int32_t a ) 
{
  int32_t y = 0;
  int32_t b = a & 0x01ff;

  while ( b ) 
  {
    if ( b & 1 ) 
      y ^= a;

    a <<= 1;
    b >>= 1;
  }
  return y;
}

Or if you like:

int32_t GaloisMultiply( int32_t a ) 
{
  int32_t y = 0;
  for ( int32_t b = a & 0x01ff; b; b >>= 1 )
  {
    if ( b & 1 ) 
      y ^= a;

    a <<= 1;
  }
  return y;
}

The reason that this approach is more efficient than the original code above is primarily because the loop is only performed until all the 'interesting' bits in the argument are consumed as opposed to blindly checking all (9) bits.

A table based approach will be faster though.

Lookup table is definitely the fastest for polynomial basis galois squaring. It is also the fastest for multiplication when using GF(8), but the tables get too large for larger fields as used in ECC. For multiplication in larger fields, the best algorithm is the 'left to right combine' method...(see http://www.amazon.com/Elliptic-Cryptography-Springer-Professional-Computing/dp/038795273X algorithm 2.36, page 50).

You could probably write some assembly to do a slightly better job. However, I'd be pretty surprised if this was the bottleneck in your application; have you done any profiling? This function doesn't seem like it's worth optimizing.

This is probably not what you are looking for, but here's one minor speedup:

Pass only one argument, if they are guaranteed to be the same.

It might help the compiler a bit to mark "a" and "b" as const. Or unrolling the loop by hand. It would be sad if it helped, though...

Isn't it a patent minefield, by the way ?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top