How to write a compile-time initialisation of a 4 byte character constant that is fully portable

StackOverflow https://stackoverflow.com/questions/22239629

  •  10-06-2023
  •  | 
  •  

Question

The (legacy) code looks roughly like this.

#define MAKEID(a,b,c,d) (((UInt32)a)<<24 | ((UInt32)b)<<16 
                        | ((UInt32)c)<<8 | ((UInt32)d) )
#define ID_FORM MAKEID('F','O','R','M')
//...
struct S { 
  int id;
  int x;
  double d;
  // other stuff
};
//...
S s;
socket.read(&s, sizeof(s));  // read network data over struct
if (s.id == ID_FORM) { }

The code reads a stream of characters into the struct. The S.id is known to be a 4 character constant such as 'FORM' or 'DATA' in stream (network) order, which determines the layout of the rest of the struct. All comparisons are integer using predefined constants.

The MAKEID macro is big-endian because it places the first (character) argument at the most significant byte, which is also the lowest memory address. A little endian version of the macro would look like this, placing the first (character) argument at the least significant byte, which is now the lowest memory address.

 #define MAKEID(a,b,c,d) (((UInt32)d)<<24 | ((UInt32)c)<<16 
                        | ((UInt32)b)<<8 | ((UInt32)a) )

The question is how to rewrite it so that it works equally well on big-endian and little-endian architectures.

No, I do not want to write both macros and choose which one with an #ifdef. There is no other endian dependency anywhere in the code and I'm not keen to introduce one here. Portable is the way to go.

No, I do not want to write a function. This constant is used in places where a function cannot go. I wrote a portable function that initialised a union, and the code won't compile.

Any kind of portable macro or template to do compile-time initialisation is what I'm looking for.

In answer to a comment, this is the real code. It's part of a network protocol and the other end takes care of the endianism in most cases. This happens to be an exception where the other end generates in network byte order and this end was historically written big-endian as a 4 byte character constant like 'FORM'. I need a point solution and not one that propagates the idea of endianism to elsewhere in the code.

Was it helpful?

Solution 2

I set this same question to the programmers who work for me. Between us we came up with the following 4 solutions. We shall go with the macro, and perhaps convert the code over to use one of the functions as time permits.

unsigned int MakeId(char a, char b, char c, char d) {
  char x[4] = { a, b, c, d };
  return *(int*)x;
}
unsigned int MakeId(char a, char b, char c, char d) {
  union {
    char x[4];
    int i;
  } u = { a, b, c, d };
  return u.i;
}
unsigned int MakeId(const char* s) { return *(int*)s; }

#define MAKEID(s) *(int*)(s);

#define FORM_ID MAKEID("Form")

On this occasion the formidable minds of Stack Overflow did not deliver.

OTHER TIPS

Your MAKEID macro is endianness independent. It works identically on both big- and little-endian systems.

The macro may appear to be big-endian specific, but the shifts and bitwise or operations in C++ are all defined in terms of the result they have on the values that are operated on, not the underlying storage of those values.
Doing 42 << 24 is guaranteed to put the value 42 in the most significant 8 bits of the result, regardless of the byte order. Similarly for the bitwise or operation. This means that the result of MAKEID(0x12, 0x34, 0x56, 0x78) is always 0x12345678, regardless of the byte ordering of the underlying storage.

If you want to produce an integer whose underlying storage always has the same bit pattern (for example, 0x12, 0x34, 0x56, 0x78), then you really have to rethink your approach. Such an integer will have the value 0x12345678 on a big-endian system, 0x78563412 on a little-endian system and perhaps 0x56781234 on a middle-endian system.
However, if that bit-pattern was received over a communication interface that was defined with a particular byte order (for example big-endian/network byte order), then you must convert any multi-byte values that you receive into the system's native byte order if you want those values to be interpreted correctly by the receiving system and that includes the four byte ID value.

That is why I said in an earlier version of the answer that if you find that on some systems (in particular those where the system's byte order doesn't match the communication's byte order) the ID read from the stream doesn't match the result of MAKEID, then the likely culprit is the deserialization code. The (de-)serialization code is the most important place to take endianness into account. For example, overlaying the struct you expect over the bytes you received is easy, but is the wrong solution if there can be a byte-order mismatch or a difference in padding.

Instead of having a constant defined differently on different machines you should process the data you received:

if (ntohl(s.id) == ID_FORM) { }

Edit: To avoid the editing the code, you can use the htonl to initialize the ID_FORM instead:

#define ID_FORM htonl(MAKEID('F','O','R','M'))

This relies on htonl being a macro. And it usually is. And if it is it is usually defined with the same conditional you're trying to avoid: http://www.jbox.dk/sanos/source/include/net/inet.h.html (as an example).

So if on your system htonl is not a macro, the only sane choice I see is to actually stick to #ifdef.

Keep in mind that from now on your ID_FORM is now in "network endianness", not "host endianness".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top