Question

I was working and was considering using a union. I decided against it, because the design really called for a struct/class, but it eventually lead to the following hypothetical question:

Suppose you have a union like this contrived example:

typedef union {
    char* array_c;
    float* array_f;
    int* array_i;
} my_array;

. . . and then you allocate one of the arrays and try deleting it from somewhere else:

my_array arr;
arr.array_f = (float*)(malloc(10*sizeof(float)));
free(arr.array_i);

I assume that this would work, although it is technically not defined, because of the way malloc is implemented. I also assume it would work when allocating array_c, even though, unlike int vs. float, the arrays are unlikely to be the same size.

The test could be repeated with new and delete, which are similar. I conjecture these would also work.

I'm guessing that the language specifications would hate me for doing this, but I would expect it would work. It reminds me of the "don't delete a new-ed pointer cast to void* even when it's an array not an object" business.

So questions: what does the specification say about doing this? I checked briefly, but couldn't find anything that addresses this case in particular. How ill-advised is this anyway--from a functional perspective (I realize that this is terrible from a clarity perspective).

This is purely a curiosity question for pedantic purposes.

Was it helpful?

Solution

You're precisely correct. It breaks the rules:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values, but the value of the union object shall not thereby become a trap representation.
    - ISO/IEC standard 9899, section 6.2.6.1

However, the way implementations are typically done, it will "accidentally" work properly. Since free takes a void *, the parameter will be converted to a void * to pass to free. Since all the pointers are located at the same address and all the conversions to a void * involve no change to their value, the ultimate value passed to free will be the same as if the correct member was passed.

Theoretically, an implementation could track which member of a union was accessed last and corrupt the value (or crash the program, or do anything else) if you read a different member from the one you last wrote. But to my knowledge, no actual implementation does anything like that.

OTHER TIPS

This is undefined behavior because you are accessing a different member than you set. It can do literally anything.

In practice, this will usually work, but you can't rely on it. Compilers and toolchains are not deliberately evil, but there have been cases where optimizations interacted with undefined behavior to produce completely unexpected results. And of course if you're ever on a system with a different malloc implementation, it will probably blow up.

It has nothing to do with the malloc() implementation. The union in your example uses the same memory location to store one of three "different" pointers. However, all pointers, no matter what they point to, are the same size - which is the native integer size of the architecture you're on - 32 bits on 32-bit systems and 64-bits on 64-bit systems, etc. This is because a pointer is an address in memory, which may be represented by an integer.

Let's say your arr is located at address 0x10000 (the pointer to your pointer, if you will.) Let's say malloc() finds you a memory location at 0x666660. You assign arr.array_f to this value - which means you store the pointer 0x666660 in the location 0x10000. Then you write your floats into 0x666660 to 0x666688.

Now, you attempt to access arr.array_i. Because you're using a union, the address of arr.array_i is the same as the address of arr.array_f and arr.array_c. So you are reading from the address 0x10000 again - and you read out the pointer 0x666660. Since this is the same pointer malloc returned earlier, you can go ahead and free it.

That said, attempting to interpret integers as text, or floating point numbers as integers, etc, will clearly lead to ruin. If arr.array_i[0] == 1, then arr.array_f[0] will definitely not == 1 and arr.array_c[0] will have no bearing on the character '1'. You can try "viewing" memory this way as an exercise (loop and printf()) - but you won't achieve anything.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top