What does compiling WITH_PIC (-DWITH_PIC, --with-pic) actually do?

Question 1

There are two concepts one should not confuse:

Relocatable binaries
Position independent code

They both deal with similar problems, but on a different level.

The problem

Most processor architectures have two kinds of addressing: absolute and relative. Addressing is usually used for two types of access: Accessing data (read, write, etc.) and executing a different part of the code (jump, call, etc.). Both can be done absolutely (call the code located on a fixed address, read data at a fixed address) or relative (jump to five instructions back, read relative to a pointer).

Relative addressing usually costs both, speed and memory. Speed, because the processor must calculate the absolute address from the pointer and the relative value before it can access the real memory location or the real instruction. Memory, because an additional pointer must be stored (usually in a register, which is very fast but also very scarce memory).

Absolute addressing is not always feasible, because when implemented naively, one must know all addresses at compile time. In many cases, this is impossible. When calling code from an external library, one might not know, on which memory location the operating system will load the library. When addressing data on the heap, one will not know in advance, which heap block the operating system will reserve for this operation.

Then there are many technical details. E.g. the processor architecture will only allow relative jumps up to a certain limit; all wider jumps must then be absolute. Or on architectures with a very wide address range (e.g. 64 bit or even 128 bit), relative addressing will lead to more compact code (because one can use 16 bit or 8 bit for relative addresses, but absolute addresses must always be 64 bit or 128 bit).

Relocatable binaries

When programs use absolute addresses, they make very strong assumptions about the layout of the address space. The operating system might not be able to fulfill all these assumptions. To ease this problem, most operating systems can use a trick: The binaries are enriched with additional metadata. The operating system then uses this metadata to alter the binary during runtime, so the modified assumptions fit to the current situation. Usually the metadata describe the position of instructions in the binary, which use absolute positioning. When the operating system then loads the binary, it changes the absolute addresses stored in these instructions when necessary.

An example for these metadata are the "Relocation Tables" in the ELF file format.

Some operating systems use a trick, so they need not always process every file before running it: They preprocess the files and change the data, so their assumptions will very likely fit the situation at runtime (and hence no modification is needed). This process is called "prebinding" on Mac OS X and "prelink" on Linux.

Relocatable binaries are produced at linker level.

Position independent code (PIC)

The compiler can produce code, that uses only relative addressing. This could mean relative addressing for data and code or only for one of these categories. The option "-fPIC" on gcc e.g. means relative addressing for code is enforced (i.e. only relative jumps and calls). The code can then run located on any memory address without any modification. On some processor architectures, such code will not always be possible, e.g. when relative jumps are limited in their scope (e,g, maximum 128 instructions wide relative jumps are allowed).

Position independent code is handled on the compiler level. Executables containing only PIC code need no relocation information.

When is PIC code needed

In some special cases, one absolutely needs PIC code, because reloction during loading is not feasible. Some examples:

Some embedded systems can run binaries directly from the file system, without first loading them into memory. This is usually then the case, when the file system is already in memory, e.g. in ROM or FLASH memory. The executalbes then start much faster and need no extra part of the (usually scarce) RAM. This feature is called "execute in place".
You are using some special plugin system. An extreme case would be so called "shell code", i.e. code injected using a security hole. You will then usually not know where your code will be located at runtime and the executable in question will not provide a relocation service for your code.
The operating system does not support relocatable binaries (usually due to scarce resources, e.g. on a embedded platform)
The operating system can cache common memory pages between running programs. When binaries ere changed during relocation, this caching will no longer work (because each binary has its own version of the relocated code).

When PIC should be avoided

In some cases it might be impossible for the compiler, to make everything position independent (e.g. because the compiler is not "clever" enough or because the processor architecture is too restricted)
The position independent code might be too slow or too big because of the many pointer operations.
The optimizer might have problems with the many pointer operations, so it will not apply necessary optimizations and the executable will run like molasse.

Advice / Conclusion

PIC code might be needed because of some special constraints. In all other cases, stick with the defaults. If you do not know about such constraints, you don't need "-fPIC".

Question 2

There are really two reasons you would want to compile this way.

One, if you want to make a shared library. Generally, shared libraries must be PIC on Linux.

Two, you may want to compile the main executable "PIE", which is basically PIC for executables. PIE is a security feature that allows address space randomization to be applied to the main executable.

Question 3

Shared libraries and executables can be built with PIC code enabled and disabled. I.e. if you build them without PIC they can still be used by other apps. However, non-PIC libraries are not supported everywhere - but on Linux there are, with some limitations.

=== This is a brief explanation that you don't need ;-) ===

What PIC does, is that it makes code position independent. Each shared library is loaded at some position in memory - for security reasons this place is often randomized - and thus "absolute" memory references in the code can't really be "absolute" - in fact they are relative to the library's memory segment start address. After the library is loaded, they have to be adjusted.

This can be done by walking all of them (their addresses will be stored in the file header) and corrected. But this is slow, and "corrected" image can't be shared between processes if the base address is different.

Thus a different method is usually used. Each reference to a memory is done via a special register (usually ebx). When a function is called, at the start it jumps to a special code block that adjusts ebx value to the library's memory segment address. Then the function access its data using [ebx + know offset].

So for each program only this code block have to be adjusted, not every function and memory reference.

Notice that if function is know to be called from the same shared library's other function, the compiler/linker can omit PIC register (ebx) adjustment, because it is known to already have the correct value. In some architectures (most notably x86_64) programs can access data relative to the IP (current instruction pointer), which is already absolute-adjusted and thus it elimitates the need for a special register like ebx and its adjustment.

=== Here is the end of the section that can be skipped without reading ===

So why would you want to build something without PIC?

Well, first of all it slows down you program by few percents, because at the start of each function an additional code is run to adjust register, and a precious register is not available for the optimizer (x86 only). Often function can't know if it's called from same library or from another, and thus even internal calls suffer from the penalty. So if you want to optimized for speed - try to compile without PIC.

Then, the code size is a bit bigger, as you noticed, because each function will contain a few more instructions to setup PIC register.

This can be avoid to some degree if we use Link-time optimization (--lto switch) and protected functions visibility so that compiler knows which functions are not called externally at all and thus they do not need PIC code. But I haven't tried that (yet).

And why would you want to use PIC? Because it's more secure (this is required for address space randomization); because not all systems support non-PIC libs; because startup load time may be slower for non-PIC libs (the whole code segment have to be adjusted to absolute addresses instead of just table stubs); and loaded library segments can't be shared if they are loaded into different space (i.e. it may cause more memory used). Then, not all compiler/linker flags are compatible with non-PIC libraries (from what I remember there's something about thread-local support) so sometimes you won't be able to build non-PIC code at all.

So non-PIC code is a bit riskier (less secure) and you can't get it always, but if you need it (e.g. for speed) - why not.

Question 4

The main reason I have seen PIC being used under Linux is when you create an object that will be used by another system or many software (i.e. a system library or a library which is part of a software suite such as MySQL.)

For example, you can write modules for PHP, Apache, and probably MySQL, and those modules need to be loaded by those tools and that will happen at some "random" address and they will be able to execute their code with minimal work on the code. Actually, in most cases these systems check to see whether your module is a PIC (Position Independent Code, as queen3 underlined) module and if not they refuse to load your module.

This allows most of your code to run without having to do what is called relocations. A relocation is an addition to an address of a the base address where the code was loaded and that modifies the code of the library (it is perfectly safe though.) This is important for dynamic libraries since each time they are loaded by a different process, they may be given a different address (note that has nothing to do with security, only address space that's available to your process.) However, relocations means that each version is different since, as I just said, you modify the code that was loaded for each process and thus each process has a different version in memory (which means that the fact that the library is dynamically loaded does not do as much as it otherwise could!)

The PIC mechanism creates a table, as mentioned by others, that is specific to your process as is the read/write memory (.data) used by those libraries, but the rest of the library (the .text and .rodata sections) remains intact meaning that it can be used by many processes from that one location (although the address of that library may be different to the point of view of each process, note that is a side effect of what is called the MMU: Memory Management Unit, which can assign a virtual address to any physical address.)

In the old days, under systems such as the famous IRIX system from SGI, the mechanism was to pre-assign a base address for each dynamic library. That was a pre-relocation so that way each process would find that dynamic library at that one specific location making it truly shareable. But when you have hundreds of shared libraries, pre-allocating a virtual address to each one of them would make it nearly impossible to run large systems as we have today. And I won't even talk about the fact that one library may get upgraded and now bump in the one that was assigned the address right after... Only the MMU of the time were less versatile than those of today and PIC was not yet viewed as a good solution.

To answer your question in regard to mysql, the -DWITH_PIC is probably a good idea because many tools run all the time and all those libraries will be loaded once and reused by all the tools. So at run time, it will be faster. Without the PIC feature, it will certainly have to reload that same library over and over again, wasting a lot of time. So a few more Mb can save you millions of cycles per second and when you run a process 24/7, that's quite a bit of time!

I'm thinking that maybe a small example in assembly would better explain what we're talking about here...

When your code needs to jump to some place, the simplest is to use a jump instruction:

jmp $someplace

In this case, $someplace is called an absolute address. This is a problem since if you load your code at a different location (a different base address) then $someplace changes too. To palliate, we have relocations. This is a table tells the system to add the base address to $someplace so that way the jmp actually works as expected.

When using PIC, that jump instruction with an absolute address is transformed in one of two ways: jump through a table or jump using relative addresses.

jmp $function_offset[%ebx] ; jump to the table where function is defined at function_offset
bra $someplace ; this is relative to IP so no need to change anything

As you can see here, I use the special instruction bra (branch) instead of a jump to get the relative jump. This is possible if you are jumping to another place within the same section of code, although in some processors such jumping is very limited (i.e. -128 to +127 bytes!) but with newer processors, the limit is generally +/-2Gb.

The jmp (or jsr for jump to sub-routine, on INTEL it's the call instruction), however, will generally be used when jumping to a different function or outside the same section code. That is just a lot cleaner to handle inter-function calls.

In many ways, most of your code is already in PIC except:

when you call another function (other than inline or intrinsic functions)
when you access data

For data we have a similar problem, we want to load a value from an address with a mov:

mov %eax, [$my_data]

Here %my_data would be an absolute address which would require a relocation (i.e. the compiler would save the offset of $my_data compared to the start of the sections and on load the base address where the library gets loaded would be added to the location of the address in the mov instruction.)

This is where our table comes into play with the %ebx register. The start of the address is found at some specific offset in the table and it can be retrieved to access the data. This requires two instructions:

mov %eax, $data_pointer[%ebx]
mov %eax, $my_data_offset[%eax]

We first load the pointer to the start of the data buffer, then we load the data itself from that pointer. It's a bit slower, but the first load will be cached by the processor so re-accessing it over and over again will be instantaneous anyway (no actual memory access.)