Question

I have a situation in a legacy code with a large field of a structure being split into two sub-fields. For example, a uint32 is split into two uint16's:

typedef struct
{
    uint16 myVar_H;
    uint16 myVar_L;
} MyStruct;

Until now these fields were read from a uint32 variable via shifting to read each uint16 portion. Now we want to optimize this so that we read the entire uint32 by casting the first field's address to a uint32:

MyStruct s;
*((uint32*)(&s.myVar_H)) = myValue;

This works for my compiler. The question is if this is legal and defined in C89 (or other C standards)?

Note: changing the structure is another option, but since this is an ancient legacy code based on a proprietary protocol I'd rather not touch the structures definition.

Edit: I am using a big-endian MCU (a ColdFire), so this works correctly in the endianess regard.

Was it helpful?

Solution

There's a few problems with your "optimized" solution:

  1. It violates the strict aliasing rule.
  2. It will not work on little-endian machines (the immense majority of computers nowadays), because you are not reordering correctly the high and low fields.
  3. There is no guarantee that myVar_H and myVar_L are consecutive in memory: the compiler is allowed to add padding between the two fields.

The legal and platform-independant way of achieving what you want is to keep your previous solution with shifting. It shouldn't bring any performance issues.

MyStruct s;
// Reading
uint32 u = (uint32)s.myVar_H << 16 | s.myVar_L; // The cast is important
// Writing
s.myVar_H = myValue >> 16;
s.myVar_L = myValue % 0xFFFF;

Edit following comment: you say you work on big endian machines so memory order is not a problem. In that case, the "most legal" thing you may do is make your struct a union with an anonymous struct (not actual C89, it's a gcc extension). This way you don't break strict aliasing and the new definition doesn't break any existing code.

typedef union // Your type is now a union
{
    struct
    {
         uint16 myVar_H;
         uint16 myVar_L;
    }; // Anonymous struct, legal C11 but gcc extension in C89

    uint32 myVar;
} MyStruct;

s.myVar = myValue; // Safe

OTHER TIPS

Regarding your specific question:

The question is if this is legal and defined in C89?

the answer is no: it violates the strict aliasing rule. See What is the strict aliasing rule?. The compiler is free to do whatever it wants, including completely ignoring the statement:

   *((uint32*)(&s.myVar_H)) = myValue;

A safer approach is to use memcpy; your code snippet becomes:

   MyStruct s;
   memcpy(&s.myVar_H, &myValue, sizeof myValue);

Of course, in this particular instance, &s.myVar_H is the same as &s.

This assumes myValue has to have the right size and format (e.g. it can't be a char or a float). It this is not the case, and if you have C99's compound literals, you can write:

   memcpy(&s.myVar_H, (uint32[]){myValue}, sizeof(uint32));

This is ugly, but you can hide its ugliness in a macro:

   #define LEGACY_CPY32(dst, src) memcpy(&(dst), (uint32[]){src}, sizeof(uint32))

But these suggestion will avoid only the strict-aliasing issue. Litte-endian problems and structure padding problems will still be present, and your code could break if you switch machines and/or compilers, or even just change compiler switches.


Update #1: Of course no one would ever write new code like this. But I am assuming the OP's goal is to update working, legacy code with the minimal number of changes, rather than doing a complete rewrite.

See also:


Update #2: Maybe I can present a more convincing case? For any construct, one can argue in terms of:

  • legality
  • efficiency
  • elegance

For the third criteria, elegance, my suggestion loses hands down, compared with e.g. union punning.

For the second criteria, efficiency, John Regehr in Embedded in Academia, "Type Punning, Strict Aliasing, and Optimization" (June 2013) states regarding a similar issue:

Compilers such as GCC 4.8.1 and Clang ~3.3 (both for x86-64 Linux) fail to generate good code for c2 [an example of the type punning feature of unions] ... [and give] crappy object code

whereas in his example c5 which uses memcpy:

Both compilers understand memcpy deeply enough that we get the desired [optimal] object code.

In my opinion c5 is the easiest code to understand out of this little batch of functions because it doesn't do the messy shifting and also it is totally, completely, obviously free of complications that might arise from the confusing rules for unions and strict aliasing. It became my preferred idiom for type punning a few years ago when I discovered that compilers could see through the memcpy and generate the right code.

Finally, let us look at the first criteria, legality. Regehr states regarding union punning (emphasis added):

Unfortunately this code [c2, using union punning] is also undefined by modern C/C++ dialects (or maybe just unspecified, I'm not totally sure).

I am reluctant to disagree with Regehr, but the consensus on Stack Overflow seems to be that union punning has been valid since C99; see:

But it was not defined behavior in C89, which the OP specifically asks about.

Finally note the OP requests that:

Since this is an ancient legacy code based on a proprietary protocol I'd rather not touch the structures definition

The memcpy solution leaves the header files untouched, which is advantageous if you are working with a team of touchy programmers. And you can easily back out all of your changes by redefining the macro LEGACY_CPY32 to be the previous "incorrect"-but-working functionality:

 #define LEGACY_CPY32(dst, src) (*((uint32*)(&(dst))) = (src))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top