Question

There are a couple of related questions here. Consider a program consisting only of the following two instructions

movq 1, %rax
cpuid

If I throw this into a file called Foo.asm, and run as Foo.asm, where as is the portable GNU assembler, I will get a file called a.out, of size 665 bytes on my system.

If I then chmod 700 a.out and try ./a.out, I will get an error saying cannot execute binary file.

  1. Why is the file so large, if I am merely trying to translate two asm instructions into binary?
  2. Why can the binary not be executed? I am providing valid instructions, so I would expect the CPU to be able to execute them.
  3. How can I get exactly the binary opcodes for the asm instructions in my input file, instead of a bunch of extra stuff?
  4. Once I have the answer to 3, how can I get my processor to execute them? (Assuming that I am not running privileged instructions.)
Was it helpful?

Solution

  1. Why is the file so large, if I am merely trying to translate two asm instructions into binary?

    Because the assembler creates a relocatable object file which includes additional information, like memory Sections and Symbol tables.

  2. Why can the binary not be executed?

    Because it is an (relocatable) object file, not a loadable file. You need to link it in order to make it executable so that it can be loaded by the operating system:

    $ ld  -o Foo a.out
    

    You also need to give the linker a hint about where your program starts, by specifying the _start symbol.

    But then, still, the Foo executable is larger than you might expect since it still contains additional information (e.g. the elf header) required by the operating system to actually launch the program.

    Also, if you launch the executable now, it will result in a segmentation fault, since you are loading the contents of address 1, which is not mapped into your address space, into rax. Still, if you fix this, the program will run into undefined code at the end - you need to make sure to gracefully exit the program through a syscall.

    A minimal running example (assumed x86_64 architecture) would look like

    .globl  _start
    _start:
        movq $1, %rax
        cpuid
    
        mov     $60, %rax       # System-call "sys_exit"
        mov     $0, %rdi        # exit code 0
        syscall
    
  3. How can I get exactly the binary opcodes for the asm instructions in my input file, instead of a bunch of extra stuff?

    • You can use objcopy to generate a raw binary image from an object file:

      $ objcopy -O binary a.out Foo.bin
      

      Then, Foo.bin will only contain the instruction opcodes.

    • nasm has a -f bin option which creates a binary-only representation of your assembly code. I used this to implement a bare boot loader for VirtualBox (warning: undocumented, protoype only!) to directly launch binary code inside a VirtualBox image without operating system.

  4. Once I have the answer to 3, how can I get my processor to execute them?

    You will not be able to directly execute the raw binary file under Linux. You will need to write your own loader for that or not use an operating system at all. For an example, see my bare boot loader link above - this writes the opcodes into the boot loader of a VirtualBox disc image, so that the instructions are getting executed when launching the VirtualBox machine.

OTHER TIPS

The old MS-DOS COM file format does not include a header. It really only contains the binary executable code. The code size can, however, not exceed 64kb. I don't know whether Linux can execute these.

You can write the opcodes into a file using a hexeditor. Then you just need to surround it with an elf header that Linux knows how to execute it.

Here's an example:

hexedit myfile.bin

Now just write your opcodes inside the file using the hexeditor.

After that you need to add the elf header. You could do this by hand and write the elf header into your .bin file, but that a bit tricky. Easiest method is to use a few commands (In this example for 64 bit).

  1. objcopy --input-target=binary --output-target=elf64-x86-64 myfile.bin myfile.o
  2. ld -o myfile myfile.o -T binary.ld

You will also need a linker script. I called this for example binary.ld.

And that are the contents of binary.ld:

ENTRY(_start);

SECTIONS
{
       _start = 0x0;  
}

Now you can execute your program: ./myfile

Perhaps there's something like exe2bin utility for the GNU tool set. I've used various versions of exe2bin with Microsoft tools, and the ARM toolkit has the ability to produce binaries, but I don't recall if it was directly from the linked output or something like exe2bin.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top