Question

Can you propose at least 1 scenario where there is a substantial difference between

union {
T var_1;
U var_2;
}

and

var_2 = reinterpret_cast<U> (var_1)

?

The more i think about this, the more they look like the same thing to me, at least from a practical viewpoint.

One difference that I found is that while the union size is big as the biggest data type in terms of size, the reinterpret_cast as described in this post can lead to a truncation, so the plain old C-style union is even safer than a newer C++ casting.

Can you outline the differences between this 2 ?

Was it helpful?

Solution

Contrary to what the other answers state, from a practical point of view there is a huge difference, although there might not be such a difference in the standard.

From the standard point of view, reinterpret_cast is only guaranteed to work for roundtrip conversions and only if the alignment requirements of the intermediate pointer type are not stronger than those of the source type. You are not allowed (*) to read through one pointer and read from another pointer type.

At the same time, the standard requires similar behavior from unions, it is undefined behavior to read out of a union member other than the active one (the member that was last written to)(+).

Yet compilers often provide additional guarantees for the union case, and all compilers I know of (VS, g++, clang++, xlC_r, intel, Solaris CC) guarantee that you can read out of an union through an inactive member and that it will produce a value with exactly the same bits set as those that were written through the active member.

This is particularly important with high optimizations when reading from network:

double ntohdouble(const char *buffer) {          // [1]
   union {
      int64_t   i;
      double    f;
   } data;
   memcpy(&data.i, buffer, sizeof(int64_t));
   data.i = ntohll(data.i);
   return data.f;
}
double ntohdouble(const char *buffer) {          // [2]
   int64_t data;
   double  dbl;
   memcpy(&data, buffer, sizeof(int64_t));
   data = ntohll(data);
   dbl = *reinterpret_cast<double*>(&data);
   return dbl;
}

The implementation in [1] is sanctioned by all compilers I know (gcc, clang, VS, sun, ibm, hp), while the implementation in [2] is not and will fail horribly in some of them when aggressive optimizations are used. In particular, I have seen gcc reorder the instructions and read into the dbl variable before evaluating ntohl, thus producing the wrong results.


(*) With the exception that you are always allowed to read from a [signed|unsigned] char* regardless of that the real object (original pointer type) was.

(+) Again with some exceptions, if the active member shares a common prefix with another member, you can read through the compatible member that prefix.

OTHER TIPS

There are some technical differences between a proper union and a (let's assume) a proper and safe reinterpret_cast. However, I can't think of any of these differences which cannot be overcome.

The real reason to prefer a union over reinterpret_cast in my opinion isn't a technical one. It's for documentation.

Supposing you are designing a bunch of classes to represent a wire protocol (which I guess is the most common reason to use type-punning in the first place), and that wire protocol consists of many messages, submessages and fields. If some of those fields are common, such as msg type, seq#, etc, using a union simplifies tying these elements together and helps to document exactly how the protocol appears on the wire.

Using reinterpret_cast does the same thing, obviously, but in order to really know what's going on you have to examine the code that advances from one packet to the next. Using a union you can just take a look at the header and get an idea what's going on.

In C++11, union is class type, you can an hold a member with non-trivial member functions. You can't simply cast from one member to another.

§ 9.5.3

[ Example: Consider the following union:

union U {
int i;
float f;
std::string s;
};

Since std::string (21.3) declares non-trivial versions of all of the special member functions, U will have an implicitly deleted default constructor, copy/move constructor, copy/move assignment operator, and destructor. To use U, some or all of these member functions must be user-provided. — end example ]

From a practical point of view, they're most probably 100% identical, at least on real, non-fictional computers. You take the binary representation of one type and stuff it into another type.

From a language lawyer point of view, using reinterpret_cast is well-defined for some occasions (e.g. pointer to integer conversions) and implementation-specific otherwise.

Union type punning, on the other hand is very clearly undefined behaviour, always (though undefined does not necessarily mean "doesn't work"). The standard says that the value of at most one of the non-static data members can be stored in a union at any time. This means that if you set var1 then var1 is valid, but var2 is not.
However, since var1 and var2 are stored at the same memory location, you can of course still read and write any of the types as you like, and assuming they have the same storage size, no bits are "lost".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top