Question

I am trying to understand the ELF format and right now there are some thing that I don't get about the segments defined in the program header. I have this little code that I convert to an ELF file with g++ (x86_x64 on Linux):

#include <stdlib.h>
#include <iostream>

using namespace std;

int main(int argc, char *argv[])
{
    if (argc == 1)
    {
        cout << "Hello world!" << endl;
    }
    return 0;
}

With g++ -c -m64 -D ACIS64 main.cpp -o main.o and g++ -s -O1 -o Main main.o. Now, with readelf I get this list of segments:

Program Headers:
Type           Offset             VirtAddr           PhysAddr
               FileSiz            MemSiz             Flags      Align
PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
               0x00000000000001f8 0x00000000000001f8 R E        8
INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
               0x000000000000001c 0x000000000000001c R          1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
               0x0000000000000afc 0x0000000000000afc R E        200000
LOAD           0x0000000000000df8 0x0000000000600df8 0x0000000000600df8
               0x0000000000000270 0x00000000000003a0 RW         200000
DYNAMIC        0x0000000000000e18 0x0000000000600e18 0x0000000000600e18
               0x00000000000001e0 0x00000000000001e0 RW         8
NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
               0x0000000000000044 0x0000000000000044 R          4
GNU_EH_FRAME   0x00000000000009a4 0x00000000004009a4 0x00000000004009a4
               0x0000000000000044 0x0000000000000044 R          4
GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
               0x0000000000000000 0x0000000000000000 RW         10
GNU_RELRO      0x0000000000000df8 0x0000000000600df8 0x0000000000600df8
               0x0000000000000208 0x0000000000000208 R          1

With Bless Hex Editor I am looking at the code and try to find each one of these segments.

  • I find the PHDR segment just after the ELF header and having the size of this entire program header. It has an alignment of 8 bytes and is readable/executable. [!]I don't understand why executable. PHDR

  • I find the segment where the interpreter is declared, just after the PHDR. It has the size of the interpreter's path and an alignment of 1 byte. Correct
    INTERP

  • Now I have a segment that is readable and executable, which [!]I suppose is the code segment. I don't understand why does it start at 0x0000000000000000. Shouldn't this start where the entry point is located? Why does it have a size of 0xafc bytes? Isn't the size only the size of the code? How much of the file is executable? Also, I don't understand why the alignment is 0x200000 bytes. Is that how much space is reserved for a LOAD segment in memory?. This is where this segment ends and an amout of 764 0x0 bytes follows it:
    LOAD1

  • The next one (readable and writable) [!]I suppose is a segment where variables are stored. It ends just where something like the sections header might be starting.
    LOAD2
  • Now the next one is a DYNAMIC header. It starts at 0xe18, which is inside the one above. [!]I thought this was a segment where references to external functions and variables are stored but I am not sure. It is readable and writable. I just don't know what segment is this and why it is "inside" the LOAD segment above DYNAM
  • A NOTE segment, containing some info that I suppose is not important right now
  • GNU specific segments, one of them having any offsets and sizes equal to 0x0000000000000000, others interfering with other segments, which I don't get, either.

I come from the PE world, where each thing has its own well defined offset and size and here I see these weird addresses and sizes and I am confused.

Was it helpful?

Solution

The readelf output displays the program header table. It contains the list of segments (which may be loadable or non-loadable) in the ELF file. It is common for a segment to contain other segments, as seen here.

I find the PHDR segment just after the ELF header and having the size of this entire program header. It has an alignment of 8 bytes and is readable/executable. [!]I don't understand why executable.

If you read the readelf output carefully, you will notice that PHDR is actually a part of the code segment (notice the VirtAddr and the MemSiz fields). That explains why it shares the same permissions as the code segment (RX).

Now I have a segment that is readable and executable, which [!]I suppose is the code segment. I don't understand why does it start at 0x0000000000000000. Shouldn't this start where the entry point is located? Why does it have a size of 0xafc bytes? Isn't the size only the size of the code? How much of the file is executable? Also, I don't understand why the alignment is 0x200000 bytes. Is that how much space is reserved for a LOAD segment in memory?. This is where this segment ends and an amout of 764 0x0 bytes follows it:

Yes, this is the code segment. It begins at the beginning of the file (i.e. offset 0) and extends upto 0xafc bytes in the file. The header specifies that this part of the file is mapped to 0x0000000000400000 in memory when the ELF is loaded. The segment not only consists of the main( ) from the C++ file, some other executable stuff is also added by the compiler. Alignment only specifies where should the next segment begin, not the size of the segment. Loadable segments should have congruent values of VirtAddr and PhysAddr fields modulo page size (or Align field, if Align!=0 && Align!=1). That explains why VirtAddr for data segment is 0x0000000000600df8 (0x0000000000600df8 - 0x0000000000000df8 % 0x200000 == 0). The region in file between the text segment and the data segment (i.e. between 0xafc and 0xdf8) is filled with zeroes.

The next one (readable and writable) [!]I suppose is a segment where variables are stored. It ends just where something like the sections header might be starting.

Correct, this is the data segment that stores the global and static variables (among other stuff). It ends just before the section headers.

Now the next one is a DYNAMIC header. It starts at 0xe18, which is inside the one above. [!]I thought this was a segment where references to external functions and variables are stored but I am not sure. It is readable and writable. I just don't know what segment is this and why it is "inside" the LOAD segment above

Just like the PHDR segment is a part of the code segment, DYNAMIC segment is a part of the data segment. That's why the same permissions (RW). It contains .dynamic section which contains an array of structures such as addresses of symbol and string tables.

GNU specific segments, one of them having any offsets and sizes equal to 0x0000000000000000, others interfering with other segments, which I don't get, either.

GNU_EH_FRAME is a part of code segment and GNU_RELRO is a part of data segment (See the VirtAddr and MemSiz fields). GNU_STACK is just an program header which tells the system how to control the stack when the ELF is loaded into memory. (FileSiz and MemSiz are 0).

References:

  1. ELF File format specification
  2. Linkers and Loaders, by John R. Levine
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top