Type specifications in platform ABIs

Question 1

The C standard contains an entire section in the appendix summarizing just that:

J.3 Implementation-defined behavior

A completely random subset:

The number of bits in a byte
Which of signed char and unsigned char is the same as char
The text encodings for multibyte and wide strings
Signed integer representation
The result of converting a pointer to an integer and vice versa (6.3.2.3). Note that this means any pointer, not just object pointers.

Update: To address your question about ABIs: An ABI (application binary interface) is not a standardized concept, and it isn't said anywhere that an implementation must even specify an ABI. The ingredients of an ABI are partly the implementation-defined behaviour of the language (though not all of it; e.g. signed-to-unsigned conversion is implementation defined, but not part of an ABI), and most of the implementation-defined aspects of the language are dictated by the hardware (e.g. signed integer representation, floating point representation, size of pointers).

However, more important aspects of an ABI are things like how function calls work, i.e. where the arguments are stored, who's responsible for cleaning up the memory, etc. It is crucial for two compilers to agree on those conventions in order for their code to be binarily compatible.

In practice, an ABI is usually the result of an implementation. Once the compiler is complete, it determines -- by virtue of its implementation -- an ABI. It may document this ABI, and other compilers, and future versions of the same compiler, may like to stick to those conventions. For C implementations on x86, this has worked rather well and there are only a few, usually well documented, free parameters that need to be communicated for code to be interoperable. But for other languages, most notably C++, you have a completely different picture: There is nothing coming near a standard ABI for C++ at all. Microsoft's compiler breaks the C++ ABI with every release. GCC tries hard to maintain ABI compatibility across versions and uses the published Itanium ABI (ironically for a now dead architecture). Other compilers may do their own, completely different thing. (And then you have of course issues with C++ standard library implementations, e.g. does your string contain one, two, or three pointers, and in which order?)

To summarize: many aspects of a compiler's ABI, especially pertaining to C, are dictated by the hardware architecture. Different C compilers for the same hardware ought to produce compatible binary code as long as certain aspects like function calling conventions are communicated properly. However, for higher-level languages all bets are off, and whether two different compilers can produce interoperable code has to be decided on a case-by-case basis.

Question 2

I don't see how you can expect any library to be universally compatible. If that were possible, there would not be so many compiled variations of libraries.

For example, you could call a 64-bit library from a 16-bit program as long as you set up the call correctly. But you would have to know you're calling a 64-bit based library.

Portability is a much-talked about goal, but few truly achieve it. After 30+ years of system-level, firmware and application programming, I think of it as more of a fantasy versus a goal. Unfortunately, hardware forces us to optimize for the hardware. Therefore, when I write a library, I use the following:

Compile for ABI
Use a pointer to a structure for input and output for all function calls:
```
int lib_func(struct *input, struct *output);
```

Where the returning int indicates errors only. I make all error codes unique. I require the user to call an init function prior to any use of the library. The user calls it as:

    lib_init(sizeof(int), sizeof(char *), sizeof(long), sizeof(long long));

So that I can decide if there will be any trouble or modify any assumptions if needed. I also add a function allowing the user to learn my data sizes and alignment in addition to version numbers.

This is not to say the user or I am expected to "on-the-fly" modify code or spend lots of CPU power reworking structures. But this allows the application to make absolutely sure it's compatible with me and vice-versa.

The other option which I have employed in the past, is to simply include several entry-point functions with my library. For example:

   int lib_func32();
   int lib_func16();
   int lib_func64();

It makes a bit of a mess for you, but you can then fix it up using the preprocessor:

   #ifdef LIB_USE32
      #define  lib_function  lib_func32
   #endif

You can do the same with data structures but I'd recommend using the same size data structure regardless of CPU size -- unless performance is a top-priority. Again, back to the hardware!

The final option I explore is whether to have entry functions of all sizes and styles which convert the input to my library's expectations, as well as my library's output.

For example, your lib_func32(&input, &output) can be compiled to expect a 32-bit aligned, 32-bit pointer but it converts the 32-bit struct into your internal 64-bit struct then calls your 64 bit function. When that returns, it reformats the 64-bit struct to its 32-bit equivalent as pointed to by the caller.

   int lib_func32(struct *input32, struct *output32)
   {
   struct input64;
   struct output64;
   int    retval;

       lib_convert32_to_64(input32, &input64);

       retval = lib_func64(&input64, &output64);

       lib_convert64_to_32(&output64, output32);

       return(retval);
   }

In summary, a totally portable solution is not viable. Even if you begin with total portability, eventually you will have to deviate. This is when things truly get messy. You break your style for deviations which then breaks your documentation and confuses users. I think it's better to just plan it from the start.

Hardware will always cause you to have deviations. Just consider how much trouble 'endianness' causes -- not to mention the number of CPU cycles which are used each day swapping byte orders.

Question 3

If I understand your needs correctly, uint style ones are the only ones that will give you binary compatibility guarantee and of cause int, char will but others tend to differ. i.e long on Windows and Linux, Windows considers it 4byte and Linux as 8byte. If you are really dependent on ABI, you have to plan for the platforms you are going to deliver and may be use typedefs to make things standardized and readable.