Question

For some reason I made simple program in C to output binary representation of given input:

int main()
{
  char c;
  while(read(0,&c,1) > 0)
    {
      unsigned char cmp = 128;
      while(cmp)
        {
          if(c & cmp)
            write(1,"1",1);
          else
            write(1,"0",1);
          cmp >>= 1;
        }
    }

  return 0;
}

After compilation:

$ gcc bindump.c -o bindump

I made simple test to check if program is able to print binary:

$ cat bindump | ./bindump | fold -b100 | nl

Output is following: http://pastebin.com/u7SasKDJ

I suspected the output to look like random series of ones and zeroes. However, output partially seems to be quite more interesting. For example take a look at the output between line 171 and 357. I wonder why there are lots of zeros in compare to other sections of executable ?

My architecture is:

$ lscpu

Architecture:          i686
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 28
Stepping:              10
CPU MHz:               1000.000
BogoMIPS:              3325.21
Virtualization:        VT-x
L1d cache:             24K
L1i cache:             32K
L2 cache:              512K
Was it helpful?

Solution

When you compile a program into an executable on Linux (and a number of other unix systems), it is written in the ELF format. The ELF format has a number of sections, which you can examine with readelf or objdump:

readelf -a bindump | less

For example, section .text contains CPU instructions, .data global variables, .bss uninitialized global variables (it is actually empty in the ELF file itself, but is created in the main memory when the program is executed), .plt and .got which are jump tables, debugging information, etc.

Btw. it is much more convenient to examine the binary content of files with hexdump:

hexdump -C bindata | less

There you can see that starting with offset 0x850 (approx. line 171 in your dump) there is a lot of zeros, and you can also see the ASCII representation on the right.

Let us look at which sections correspond to the block of your interest between 0x850 and 0x1160 (the field Off – offset in the file is important here):

> readelf -a bindata
...
Section Headers:
[Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
...
[28] .shstrtab         STRTAB          00000000 00074c 000106 00      0   0  1
[29] .symtab           SYMTAB          00000000 000d2c 000440 10     30  45  4
...

You can examine the content of an individual section with -x:

> readelf -x .symtab bindump | less
0x00000000 00000000 00000000 00000000 00000000 ................
0x00000010 00000000 34810408 00000000 03000100 ....4...........
0x00000020 00000000 48810408 00000000 03000200 ....H...........
0x00000030 00000000 68810408 00000000 03000300 ....h...........
0x00000040 00000000 8c810408 00000000 03000400 ................
0x00000050 00000000 b8810408 00000000 03000500 ................
0x00000060 00000000 d8810408 00000000 03000600 ................

You would see that there are many zeros. The section is composed of 18-byte values (= one line in the -x output) defining symbols. From readelf -a you can see that it has 68 entries, and first 27 of them (excl. the very first one) are of type SECTION:

Symbol table '.symtab' contains 68 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 08048134     0 SECTION LOCAL  DEFAULT    1 
     2: 08048148     0 SECTION LOCAL  DEFAULT    2 
     3: 08048168     0 SECTION LOCAL  DEFAULT    3 
     4: 0804818c     0 SECTION LOCAL  DEFAULT    4 
     ...

According to the specification (page 1-18), each entry has the following format:

typedef struct {
    Elf32_Word st_name;
    Elf32_Addr st_value;
    Elf32_Word st_size;
    unsigned char st_info;
    unsigned char st_other;
    Elf32_Half st_shndx;
} Elf32_Sym;

Without going into too much detail here, I think what matters here is that st_name and st_size are both zeros for these SECTION entries. Both are 32-bit numbers, which means lots of zeros in this particular section.

OTHER TIPS

This is not really a programming question, but however...

A binary normally consists of different sections: code, data, debugging info, etc. Since these sections contents differ by type, I would pretty much expect them to look different.

I.e. the symbol table consists of address offsets in your binary. If I read your lspci correctly, you are on a 32-bit system. That means Each offset has four bytes, and given the size of your program, in most cases two of those bytes will be zero. And there are more effects like this.

You didn't strip your program, that means there's still lots of information (symbol table etc.) present in the binary. Try stripping the binary and have a look at it again.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top