I'm attempting to start learning x86(-64) machine code because sometime in the future I want to write a compiler or JIT compiler (probably the latter first). I've written assembly for a while so I'm not going into this blind, I'm just trying to learn the x86 instruction encoding/format since it seems to be quite complex. I've seen tables and read articles and stuff (as well as some of the intel manuals (the most inhuman document I've ever read)).
So I'm kind of starting to understand it so I decided I would try to break down a basic set of instructions and their resulting machine code. The code makes a (Linux/Posix) syscall (sys_exit) because I thought it would be easy to test (continue reading). Here is the x86-64 code:
mov rax, 60
mov rdi, 0
syscall
I assembled this with nasm:
nasm test.asm -fbin
I used -fbin
so it would output raw binary I could easily examine.
It output the following byte series:
0xB8 0x3C 0x00 0x00 0x00 0xBF 0x00 0x00 0x00 0x00 0x0F 0x05
I and a few friends have attempted to dissect this to find out what each byte means, we think the following:
0xB8
is the first mov
instruction.
0x3C
(60) is the first argument, moved into rax.
0x00
signifies rax? (this is where we get fuzzy)
- The next two
0x00
are just excess nasm output? xD
0xBF
is the next mov
0x00
is the first argument, 0
- Clueless on the remaining
0x00
0x0F
syscall?
0x05
clueless.
I'm a beginner (obviously) and would appreciate help with dissecting this machine code. Any help would be greatly appreciated as understanding this will help a lot in understand x86 instruction formatting, thanks in advance!
Edit: Is it a possibility that registers are specified by the instruction opcode?