Question

I would like to know if I'm breaking strict aliasing rules with this snippet. (I think so since it's dereferencing a punned-pointer, however it's done in a single expression and /Wall doesn't cry.)

inline double plop() const // member function
{
    __m128d x = _mm_load_pd(v);
    ... // some stuff
    return *(reinterpret_cast<double*>(&x)); // return the lower double in xmm reg referred to by x.
}

If yes, what's the workaround? Using different representations simultaneously is becoming hardcore once you want to respect the spec.

Thanks for your answers, I'm losing my good mood trying to find a solution.

Answers that won't be accepted and why:

"use mm_store" -> The optimizer fails to remove it if the following instructions require an xmm register so it generates a load just after it. Store + load for nothing.

"use a union" -> Aliasing rule violation if using the two types for the same object. If I understood well the article written by Thiago Macieira.

Was it helpful?

Solution

There is only one intrinsic that "extracts" the lower order double value from xmm register:

double _mm_cvtsd_f64 (__m128d a)

You could use it this way:

return _mm_cvtsd_f64(x);

There is some contradiction between different references. MSDN says: This intrinsic does not map to any specific machine instruction. While Intel intrinsic guide mentions movsd instruction. In latter case this additional instruction is easily eliminated by optimizer. At least gcc 4.8.1 with -O2 flag generates code with no additional instruction.

OTHER TIPS

The bullet point in bold should i think allow your cast here, as we may consider __m128d as an aggregate of four double union to the full register. In regards to strict aliasing, compiler had always be very conciliate around union where at the origin, only a cast to (char*) was supposed valid.

§3.10: If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined (The intent of this list is to specify those circumstances in which an object may or may not be aliased):

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

Yes, I think this breaks strict aliasing. However, in practice this is usually fine.
(I'm mostly writing this as an answer because It's difficult to describe well in a comment)

But, you could instead do something like this:

inline double plop() const // member function
{
    __m128d x = _mm_load_pd(v);
    ... // some stuff

    union {
        unsigned long long i; // 64-bit int
        double             d; // 64-bit double
    };

    i = _mm_cvtsi128_si64(_mm_castpd_si128(x)); // _mm_castpd_si128 to interpret the register as an int vector, _mm_cvtsi128_si64 to extract the lowest 64-bits

    return d; // use the union to return the value as a double without breaking strict aliasing
}

What about return x.m128d_f64[0]; ?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top