Question

I compile the following program with gcc and receive an output executable file a.out.:

#include <stdio.h>
int main () {
  printf("hello, world\n");
}

When I execute cat a.out, why is the file in "gibberish" (what is this called?) and not machine language of 0s and 1s:

??????? H__PAGEZERO(__TEXT__text__TEXT?`??__stubs__TEXT 
P__unwind_info__TEXT]P]__eh_frame__TEXT?H??__DATA__program_vars [continued]
Was it helpful?

Solution

The file is in 0 and 1, but when you open it with text editor those bits are grouped in bytes and then treated as text ;) In Linux you could try to disassemble the output file to ensure that it contains machine instructions (x86 architecture):

objdump -D -mi386 a.out

Example output:

1:  83 ec 08                sub    $0x8,%esp
4:  be 01 00 00 00          mov    $0x1,%esi
9:  bf 00 00 00 00          mov    $0x0,%edi 

The second column contains that 0's and 1's in hexadecimal notation and the third column contains mnemonic assembler instructions.

If you want to display those 0's and 1's simply type:

xxd -b a.out

Example output:

 0000000: 01111111 01000101 01001100 01000110 00000010 00000001  .ELF..
 0000006: 00000001 00000000 00000000 00000000 00000000 00000000  ......

OTHER TIPS

It's in some kind of executable file format. On Linux, it's probably ELF, on Mac OS X it's probably Mach-O, and so on. There's even an a.out format, but it's not that common anymore.

It can't just be bare machine instructions - the operating system needs some information about how to load it, what dynamic libraries to attach to it, etc.

Characters are also made of 0's and 1's, and the computer has no way of knowing the difference. You asked it to show the file and it did.

In addition to the machine instructions, the binary file also contains layout and optional debug information which can be readable strings.

The a.out is in a format the loader of the OS you are using can understand. Those different texts you see are markers for different parts of the 0s and 1s you expect.

The ? and ` show spots where there are binary unprintable data.

The typical format on Linux systems these days is ELF. The ELF file may contain machine code, which you can examine with the objdump utility.

$ gcc main.c
$ objdump -d -j .text a.out

a.out:     file format elf64-x86-64


Disassembly of section .text:
(code omitted for brevity)
00000000004005ac :
  4005ac:       55                      push   %rbp
  4005ad:       48 89 e5                mov    %rsp,%rbp
  4005b0:       bf 6c 06 40 00          mov    $0x40066c,%edi
  4005b5:       e8 d6 fe ff ff          callq  400490 
  4005ba:       5d                      pop    %rbp
  4005bb:       c3                      retq   
  4005bc:       0f 1f 40 00             nopl   0x0(%rax)

See? Machine code. The objdump utility helpfully prints it in hexadecimal with the corresponding disassempled code on the right, and the addresses on the left.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top