Question

The code below performs a fast inverse square root operation by some bit hacks. The algorithm was probably developed by Silicon Graphics in early 1990's and it's appeared in Quake 3 too. more info

However I get the following warning from GCC C++ compiler: dereferencing type-punned pointer will break strict-aliasing rules

Should I use static_cast, reinterpret_cast or dynamic_cast instead in such situations?

float InverseSquareRoot(float x)
{
    float xhalf = 0.5f*x;
    int32_t i = *(int32_t*)&x;
    i = 0x5f3759df - (i>>1);
    x = *(float*)&i;
    x = x*(1.5f - xhalf*x*x);
    return x;
}
Was it helpful?

Solution

Forget casts. Use memcpy.

float xhalf = 0.5f*x;
uint32_t i;
assert(sizeof(x) == sizeof(i));
std::memcpy(&i, &x, sizeof(i));
i = 0x5f375a86 - (i>>1);
std::memcpy(&x, &i, sizeof(i));
x = x*(1.5f - xhalf*x*x);
return x;

The original code tries to initialize the int32_t by first accessing the float object through an int32_t pointer, which is where the rules are broken. The C-style cast is equivalent to a reinterpret_cast, so changing it to reinterpret_cast would not make much difference.

The important difference when using memcpy is that the bytes are copied from the float into the int32_t, but the float object is never accessed through an int32_t lvalue, because memcpy takes pointers to void and its insides are "magical" and don't break the aliasing rules.

OTHER TIPS

There are a few good answers here that address the type-punning issue.

I want to address the "fast inverse square-root" part. Don't use this "trick" on modern processors. Every mainstream vector ISA has a dedicated hardware instruction to give you a fast inverse square-root. Every one of them is both faster and more accurate than this oft-copied little hack.

These instructions are all available via intrinsics, so they are relatively easy to use. In SSE, you want to use rsqrtss (intrinsic: _mm_rsqrt_ss( )); in NEON you want to use vrsqrte (intrinsic: vrsqrte_f32( )); and in AltiVec you want to use frsqrte. Most GPU ISAs have similar instructions. These estimates can be refined using the same Newton iteration, and NEON even has the vrsqrts instruction to do part of the refinement in a single instruction without needing to load constants.

Update

I no longer believe this answer is correct, due to feedback I've gotten from the committee. But I want to leave it up for informational purposes. And I am purposefully hopeful that this answer can be made correct by the committee (if it chooses to do so). I.e. there's nothing about the underlying hardware that makes this answer incorrect, it is just the judgement of a committee that makes it so, or not so.


I'm adding an answer not to refute the accepted answer, but to augment it. I believe the accepted answer is both correct and efficient (and I've just upvoted it). However I wanted to demonstrate another technique that is just as correct and efficient:

float InverseSquareRoot(float x)
{
    union
    {
        float as_float;
        int32_t as_int;
    };
    float xhalf = 0.5f*x;
    as_float = x;
    as_int = 0x5f3759df - (as_int>>1);
    as_float = as_float*(1.5f - xhalf*as_float*as_float);
    return as_float;
}

Using clang++ with optimization at -O3, I compiled plasmacel's code, R. Martinho Fernandes code, and this code, and compared the assembly line by line. All three were identical. This is due to the compiler's choice to compile it like this. It had been equally valid for the compiler to produce different, broken code.

If you have access to C++20 or later then you can use std::bit_cast

float InverseSquareRoot(float x)
{
    float xhalf = 0.5f*x;
    int32_t i = std::bit_cast<int32_t>(x);
    i = 0x5f3759df - (i>>1);
    x = std::bit_cast<float>(i);
    x = x*(1.5f - xhalf*x*x);
    return x;
}

At the moment std::bit_cast is only supported by MSVC. See demo on Godbolt

While waiting for the implementation, if you're using Clang you can try __builtin_bit_cast. Just change the casts like this

int32_t i = __builtin_bit_cast(std::int32_t, x);
x = __builtin_bit_cast(float, i);

Demo

Take a look at this for more information on type punning and strict aliasing.

The only safe cast of a type into an array is into a char array. If you want one data address to be switchable to different types you will need to use a union

The cast invokes undefined behaviour. No matter what form of cast you use, it will still be undefined behaviour. It is undefined no matter what type of cast you use.

Most compilers will do what you expect, but gcc likes being mean and is probably going to assume you didn't assign the pointers despite all indication you did and reorder the operation so they give some strange result.

Casting a pointer to incompatible type and dereferencing it is an undefined behaviour. The only exception is casting it to or from char, so the only workaround is using std::memcpy (as per R. Martinho Fernandes' answer). (I am not sure how much it is defined using unions; It does stand a better chance of working though).

That said, you should not use C-style cast in C++. In this case, static_cast would not compile, nor would dynamic_cast, forcing you to use reinterpret_cast and reinterpret_cast is a strong suggestion you might be violating strict aliasing rules.

Based on the answers here I made a modern "pseudo-cast" function for ease of application.

C99 version (while most compilers support it, theoretically could be undefined behavior in some)

template <typename T, typename U>
inline T pseudo_cast(const U &x)
{
    static_assert(std::is_trivially_copyable<T>::value && std::is_trivially_copyable<U>::value, "pseudo_cast can't handle types which are not trivially copyable");

    union { U from; T to; } __x = {x};
    return __x.to;
}

Universal versions (based on the accepted answer)

Cast types with the same size:

#include <cstring>

template <typename T, typename U>
inline T pseudo_cast(const U &x)
{
    static_assert(std::is_trivially_copyable<T>::value && std::is_trivially_copyable<U>::value, "pseudo_cast can't handle types which are not trivially copyable");
    static_assert(sizeof(T) == sizeof(U), "pseudo_cast can't handle types with different size");
    
    T to;
    std::memcpy(&to, &x, sizeof(T));
    return to;
}

Cast types with any sizes:

#include <cstring>

template <typename T, typename U>
inline T pseudo_cast(const U &x)
{
    static_assert(std::is_trivially_copyable<T>::value && std::is_trivially_copyable<U>::value, "pseudo_cast can't handle types which are not trivially copyable");

    T to = T(0);
    std::memcpy(&to, &x, (sizeof(T) < sizeof(U)) ? sizeof(T) : sizeof(U));
    return to;
}

Use it like:

float f = 3.14f;
uint32_t u = pseudo_cast<uint32_t>(f);

Update for C++20

C++20 introduces constexpr std::bit_cast in header <bit> which is functionally equivalent for types with the same size. Nevertheless, the above versions are still useful if you want to implement this functionality yourself (supposed that constexpr is not required), or if you want to support types with different sizes.

The only cast that will work here is reinterpret_cast. (And even then, at least one compiler will go out of its way to ensure that it won't work.)

But what are you actually trying to do? There's certainly a better solution, that doesn't involve type punning. There are very, very few cases where type punning is appropriate, and they all are in very, very low level code, things like serialization, or implementing the C standard library (e.g. functions like modf). Otherwise (and maybe even in serialization), functions like ldexp and modf will probably work better, and certainly be more readable.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top