Question

I'm reading through the following paper on how to implement CRC32 efficiently using the PCLMULQDQ instruction introduced in Intel Westmere and AMD Bulldozer:

V. Gopal et al. "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction." 2009. http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf

I understand the algorithm, but one thing I'm not sure about is how to calculate the constants $k_i$. For example, they provide the constant values for the IEEE 802.3 polynomial:

  • k1 = x^(4*128+64) mod P(x) = 0x8833794C
  • k4 = x^128 mod P(x) = 0xE8A45605
  • mu = x^64 div P(x) = 0x104D101DF

and so on. I can just use these constants as I only need to support the one polynomial, but I'm interested: how did they calculate those numbers? I can't just use a typical bignum implementation (e.g. the one provided by Python) because the arithmetic must happen in GF(2).

Was it helpful?

Solution

It's just like regular division, except you exclusive-or instead of subtract. So start with the most significant 1 in the dividend. Exclusive-or the dividend by the polynomial, lining up the most significant 1 of the polynomial with that 1 in the dividend to turn it into a zero. Repeat until you have eliminated all of the 1's above the low n bits, where n is the order of the polynomial. The result is the remainder.

Make sure that your polynomial has the high term in the n+1th bit. I.e., use 0x104C11DB7, not 0x4C11DB7.

If you want the quotient (which you wrote as "div"), then keep track of the positions of the 1's you eliminated. That set, shifted down by n, is the quotient.

Here is how:

/* Placed in the public domain by Mark Adler, Jan 18, 2014. */

#include <stdio.h>
#include <inttypes.h>

/* Polynomial type -- must be an unsigned integer type. */
typedef uintmax_t poly_t;
#define PPOLY PRIxMAX

/* Return x^n mod p(x) over GF(2).  x^deg is the highest power of x in p(x).
   The positions of the bits set in poly represent the remaining powers of x in
   p(x).  In addition, returned in *div are as many of the least significant
   quotient bits as will fit in a poly_t. */
static poly_t xnmodp(unsigned n, poly_t poly, unsigned deg, poly_t *div)
{
    poly_t mod, mask, high;

    if (n < deg) {
        *div = 0;
        return poly;
    }
    mask = ((poly_t)1 << deg) - 1;
    poly &= mask;
    mod = poly;
    *div = 1;
    deg--;
    while (--n > deg) {
        high = (mod >> deg) & 1;
        *div = (*div << 1) | high;  /* quotient bits may be lost off the top */
        mod <<= 1;
        if (high)
            mod ^= poly;
    }
    return mod & mask;
}

/* Compute and show x^n modulo the IEEE 802.3 CRC-32 polynomial.  If d is true,
   also show the low bits of the quotient. */
static void show(unsigned n, int showdiv)
{
    poly_t div;

    printf("x^%u mod p(x) = %#" PPOLY "\n", n, xnmodp(n, 0x4C11DB7, 32, &div));
    if (showdiv)
        printf("x^%u div p(x) = %#" PPOLY "\n", n, div);
}

/* Compute the constants required to use PCLMULQDQ to compute the IEEE 802.3
   32-bit CRC.  These results appear on page 16 of the Intel paper "Fast CRC
   Computation Using PCLMULQDQ Instruction". */
int main(void)
{
    show(4*128+64, 0);
    show(4*128, 0);
    show(128+64, 0);
    show(128, 0);
    show(96, 0);
    show(64, 1);
    return 0;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top