Question

Reading through my book Expert C Programming, I came across the chapter on function interpositioning and how it can lead to some serious hard to find bugs if done unintentionally.

The example given in the book is the following:

my_source.c

mktemp() { ... }

main() {
  mktemp();
  getwd();
}

libc

mktemp(){ ... }
getwd(){ ...; mktemp(); ... }

According to the book, what happens in main() is that mktemp() (a standard C library function) is interposed by the implementation in my_source.c. Although having main() call my implementation of mktemp() is intended behavior, having getwd() (another C library function) also call my implementation of mktemp() is not.

Apparently, this example was a real life bug that existed in SunOS 4.0.3's version of lpr. The book goes on to explain the fix was to add the keyword static to the definition of mktemp() in my_source.c; although changing the name altogether should have fixed this problem as well.

This chapter leaves me with some unresolved questions that I hope you guys could answer:

  1. Does GCC have a way to warn about function interposition? We certainly don't ever intend on this happening and I'd like to know about it if it does.
  2. Should our software group adopt the practice of putting the keyword static in front of all functions that we don't want to be exposed?
  3. Can interposition happen with functions introduced by static libraries?

Thanks for the help.

EDIT

I should note that my question is not just aimed at interposing over standard C library functions, but also functions contained in other libraries, perhaps 3rd party, perhaps ones created in-house. Essentially, I want to catch any instance of interpositioning regardless of where the interposed function resides.

Was it helpful?

Solution

It sounds like what you want is for the tools to detect that there are name conflicts in functions - ie., you don't want your externally accessible function names form accidentally having the same name and therefore 'override' or hide functions with the same name in a library.

There was a recent SO question related to this problem: Linking Libraries with Duplicate Class Names using GCC

Using the --whole-archive option on all the libraries you link against may help (but as I mentioned in the answer over there, I really don't know how well this works or how easy it is to convince builds to apply the option to all libraries)

OTHER TIPS

This is really a linker issue.

When you compile a bunch of C source files the compiler will create an object file for each one. Each .o file will contain a list of the public functions in this module, plus a list of functions that are called by code in the module, but are not actually defined there i.e. functions that this module is expecting some library to provide.

When you link a bunch of .o files together to make an executable the linker must resolve all of these missing references. This is the point where interposing can happen. If there are unresolved references to a function called "mktemp" and several libraries provide a public function with that name, which version should it use? There's no easy answer to this and yes odd things can happen if the wrong one is chosen

So yes, it's a good idea in C to "static" everything unless you really do need to use it from other source files. In fact in many other languages this is the default behavior and you have to mark things "public" if you want them accessible from outside.

Purely formally, the interpositioning you describe is a straightforward violation of C language definition rules (ODR rule, in C++ parlance). Any decent compiler must either detect these situations, or provide options for detecting them. It is simply illegal to define more than one function with the same name in C language, regardless of where these functions are defined (Standard library, other user library etc.)

I understand that many platforms provide means to customize the [standard] library behavior by defining some standard functions as weak symbols. While this is indeed a useful feature, I believe the compilers must still provide the user with means to enforce the standard diagnostics (on per-function or per-library basis preferably).

So, again, you should not worry about interpositioning if you have no weak symbols in your libraries. If you do (or if you suspect that you do), you have to consult your compiler documentation to find out if it offers you with means to inspect the weak symbol resolution.

In GCC, for example, you can disable the weak symbol functionality by using -fno-weak, but this basically kills everything related to weak symbols, which is not always desirable.

If the function does not need to be accessed outside of the C file it lives in then yes, I would recommend making the function static.

One thing you can do to help catch this is to use an editor that has configurable syntax highlighting. I personally use SciTE, and I have configured it to display all standard library function names in red. That way, it's easy to spot if I am re-using a name I shouldn't be using (nothing is enforced by the compiler, though).

It's relatively easy to write a script that runs nm -o on all your .o files and your libraries and checks to see if an external name is defined both in your program and in a library. Just one of the many sane sensible services that the Unix linker doesn't provide because it's stuck in 1974, looking at one file at a time. (Try putting libraries in the wrong order and see if you get a useful error message!)

The Interposistioning occurs when the linker is trying to link separate modules. It cannot occur within a module. If there are duplicate symbols in a module the linker will report this as an error.

For *nix linkers, unintended Interposistioning is a problem and it is difficult for the linker to guard against it. For the purposes of this answer consider the two linking stages:

  1. The linker links translation units into modulles (basically applications or libraries).
  2. The linker links any remaining unfound symbols by searching in modules.

Consider the scenario described in 'Expert C programming' and in SiegeX's question. The linker fist tries to build the application module. It sess that the symbol mktemp() is an external and tries to find a funcion definiton for the symbol. The linker finds the definition for the function in the object code of the application module and marks the symbol as found. At this stage the symbol mktemp() is completely resolved. It is not considered in any way tentative so as to allow for the possibility that the anothere module might define the symbol. In many ways this makes sense, since the linker should first try and resolve external symbols within the module it is currently linking. It is only unfound symbols that it searches for when linking in other modules. Furthermore, since the symbol has been marked as resolved, the linker will use the applications mktemp() in any other cases where is needs to resolve this symbol. Thus the applications version of mktemp() will be used by the library.

A simple way to guard agains the problem is to try and make all external sysmbols in your application or library unique. For modules that are only going to shared on a limited basis, this can fairly easily be done by making sure all extenal symbols in your module are unique by appending a unique identifier.

For modules that are widely shared making up unique names is a problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top