PCLMULQDQ instruction in C inline asm

https://stackoverflow.com/questions/20984882

25-09-2022
|

Pregunta

I want to use Intel's PCLMULQDQ instruction with inline assembly in my C Code for multiplying two polynomials, which are elements in GF(2^n). Compiler is GCC 4.8.1. The polynomials are stored in arrays of uint32_t (6 fields big).

I already checked the web how to use the PCLMULQDQ instruction or CLMUL instruction set properly, but didn't found any good documentation.

I would really appreciate a simple example in C and asm of how to multiply two simple polynomials with the instruction. Does anybody know how to do it?

Besides are there any prerequisites (except a capable processor), like included libraries, compiler options etc.?

Solución

I already found a solution. Thus for the record:

void f2m_intel_mult(
  uint32_t t, // length of arrays A and B
  uint32_t *A,
  uint32_t *B,
  uint32_t *C
)
{
    memset(C, 0, 2*t*sizeof(uint32_t));
    uint32_t offset = 0;
    union{ uint64_t val; struct{uint32_t low; uint32_t high;} halfs;} prod;

    uint32_t i;
    uint32_t j;
    for(i=0; i<t; i++){
        for(j=0; j<t; j++){

            prod.halfs.low = A[i];
            prod.halfs.high = 0;
            asm ("pclmulqdq %2, %1, %0;"
            : "+x"(prod.val)
            : "x"(B[j]), "i"(offset)
            );

            C[i+j] = C[i+j] ^ prod.halfs.low;
            C[i+j+1] = C[i+j+1] ^ prod.halfs.high;
        }
    }
}

I think it is possible to use 64bit registers for pclmulqdq, but I couldn't find out how to get this working with inline assembler. Does anybody know this?
Nevertheless it is also possible to do the same with intrinsics. (If you want the code just ask.)
Besides it is possible to optimize the calculation further with Karatsuba, if you know the size t of the arrays.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow