Question

A naive implementation of memcmp() goes something like this (from this answer):

int memcmp_test(const char *cs_in, const char *ct_in, size_t n)
{
   size_t i;  
   const unsigned char * cs = (const unsigned char*) cs_in;
   const unsigned char * ct = (const unsigned char*) ct_in;

   for (i = 0; i < n; i++, cs++, ct++)
   {
       if (*cs < *ct)
       {
          return -1;
       }
       else if (*cs > *ct)
       {
          return 1;
       }
   }
   return 0;
}

Here the blocks traversal is stopped once the first mismatching byte is found. This can be no good for cryptographic applications because it makes the execution time dependent on the blocks content and this could allow for timing attacks. So OpenSSL uses this (taken from here):

int CRYPTO_memcmp(const void *in_a, const void *in_b, size_t len)
{
    size_t i;
    const unsigned char *a = in_a;
    const unsigned char *b = in_b;
    unsigned char x = 0;

    for (i = 0; i < len; i++)
         x |= a[i] ^ b[i];

    return x;
}

There're no breaks or returns in the middle, so this code will have to traverse the entire length of blocks. At least this is the intent.

Now here's one usage example (from here):

 static int des_ede3_unwrap(EVP_CIPHER_CTX *ctx,
     unsigned char *out, const unsigned char *in, size_t inl)
 {
      unsigned char icv[8], iv[8], sha1tmp[SHA_DIGEST_LENGTH];
      //whatever, unrelated then...
      if (!CRYPTO_memcmp(sha1tmp, icv, 8))
         rv = inl - 16;
      //whatever, unrelated
 }

Now with link-time code generation (Visual C++ LTCG) or link-time optimization (gcc LTO) the compiler is able to see both CRYPTO_memcmp() implementation and the invocation site (even if they are in different translation units). It can see that the invocation site does not use the actual value, it just compares it to null. So it is free to transform CRYPTO_memcmp() such that it return immediately once it find the first mismatching pair of bytes and the "secure" version of memcmp() is no longer secure.

How can memcmp() be implemented such that the a standard compliant compiler will not transform it into version that helps timing attacks?

Was it helpful?

Solution

There're two possible solutions.

The first one is absolutely portable and up to the Standard - declare x volatile which basically tells the compiler that it must preserve the sequence of updating x and so it must read both data arrays fully. It doesn't prevent the compiler from making those reads in larger portions (such as read several bytes at a time and then use them in the right sequence), but it's no big deal here - the compiler will have to emit a number of reads proportional to the number of bytes in the data arrays. The problem with this approach is that it can make this code slower - some benchmarks I ran show about 50 percent slowdown on a specific processor and specific toolset. YMMV.

The second possible solution is to cast the pointers into volatile unsigned char* and access through them

const volatile unsigned char * cs = (const volatile unsigned char*) cs_in;
const volatile unsigned char * ct = (const volatile unsigned char*) ct_in;
// the rest of the code is the same

which is just as fast but is not fully Standard compliant (see this). Many compiler treat such cast as a hint that these reads should not be altered but the Standard doesn't really make any guarantees for that and so it is possible that a compiler breaks this code on purpose. Therefore this solution is not portable.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top