Can I wrap gcc's atomic built-ins?

https://stackoverflow.com/questions/21621430

08-10-2022
|

Question

If threads are synchronized with Pthread mutexes/spinlocks one can easily wrap the calls to pthread_mutex_lock() and pthread_mutex_unlock(), for example, using LD_PRELOAD. That can be very useful for logging/debugging.

Is it possible to do something similar with the atomic built-ins of gcc, for example __sync_fetch_and_add?

I guess that I would not be able to us LD_PRELOAD, but perhaps there exists some other mechanism.

Solution

I would think it is possible, using an instrumentation API like Intel's PIN (User Guide). For instance, you can start by instrumenting all instructions which perform atomic updates with INS_IsAtomicUpdate and add further exclusion criteria to heuristically locate the instructions generated by __sync_fetch_and_add.

Alternatively, you can install a series of NOPs with an asm volatile block before each __sync_fetch_and_add, look specifically for that instruction sequence, and instrument the machine code that follows (which is bound to be the code generated for __sync_fetch_and_add).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow