What are the real ELF TLS ABI requirements for each cpu arch?

https://stackoverflow.com/questions/12878698

07-07-2021
|

Question

Ulrich Drepper's paper on thread-local storage outlines the TLS ABI for several different cpu architectures, but I'm finding it insufficient as a basis for implementing TLS for two reasons:

It omits a number of important archs like ARM, MIPS, etc. (while including a bunch of completely-irrelevant ones like Itanium)
More importantly, it mixes a lot of implementation details with ABI, so that it's hard to tell which properties are required for interoperability, and which are just aspects of his implementation.

As an example, the only actual ABI requirements for i386 are:

%gs:0 points to a pointer to itself.
The main executable's TLS segment, if any, must be located at a fixed (by the linker, negative) offset from this address.
All other TLS segments for initially-loaded libraries must have a runtime-constant (i.e. same for each thread, but not necessarily the same across different program runs) offsets relative to this address (and the dynamic linker must be able to fill in relocations with these offsets).
___tls_get_addr and __tls_get_addr functions must exist with the correct semantics for looking up arbitrary TLS segments.

In particular, the existence or layout of a DTV is not part of the ABI, nor is the ordering/layout of TLS segments other than the main program's.

It seems that any arch using "TLS variant II" has roughly the above ABI requirements. But I don't understand the requirements of "TLS variant I" very well at all, and it seems from reading sources (in uClibc and glibc) that there may even be several variants of "variant I".

Are there any better documents I should be looking at, or can somebody familiar with the workings of TLS explain the ABI requirements to me?

Solution

The best I can gather so far is:

For either TLS variant, __tls_get_addr or other arch-specific functions must exist and have the correct semantics for looking up any TLS object, and the relative offset between any two TLS segments must be a runtime constant (same offset for each thread).

For TLS variant II (i386, etc.), the "thread pointer register" (which may not actually be a register, but perhaps some mechanism like %gs:0 or even a trap into kernelspace; for simplicity though let's just call it a register) points just past the end of the TLS segment for the main executable, where "just past the end" includes rounding up to the next multiple of the TLS segment's alignment.

For TLS variant I, the "thread pointer register" points to some fixed offset from the beginning of the TLS segment for the main executable. This offset varies by arch. (It has been chosen on some ugly RISC archs to maximize the amount of TLS accessible via signed 16-bit offsets, which strikes me as extremely useless since the compiler has no way of knowing whether the relocated offset will fit in 16 bits and thus must always generate the slower, larger 32-bit-offset code using load-upper/add instructions).

As far as I can tell, nothing about TCBs, DTVs, etc. is part of the ABI, in the sense that applications are not permitted to access these structures, nor is the location of any TLS segment other than the main executable's part of the ABI. In both variants I and II, it makes sense to store implementation-internal information for the thread at a fixed offset from the "thread pointer register", in whichever way safely avoids overlapping the TLS segment.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow