Can someone annotate this machine code?

https://stackoverflow.com/questions/22852707

x86
x86-64

27-06-2023
|

Question

I'm attempting to start learning x86(-64) machine code because sometime in the future I want to write a compiler or JIT compiler (probably the latter first). I've written assembly for a while so I'm not going into this blind, I'm just trying to learn the x86 instruction encoding/format since it seems to be quite complex. I've seen tables and read articles and stuff (as well as some of the intel manuals (the most inhuman document I've ever read)).

So I'm kind of starting to understand it so I decided I would try to break down a basic set of instructions and their resulting machine code. The code makes a (Linux/Posix) syscall (sys_exit) because I thought it would be easy to test (continue reading). Here is the x86-64 code:

mov rax, 60
mov rdi, 0
syscall

I assembled this with nasm:

nasm test.asm -fbin

I used -fbin so it would output raw binary I could easily examine.

It output the following byte series:

0xB8 0x3C 0x00 0x00 0x00 0xBF 0x00 0x00 0x00 0x00 0x0F 0x05

I and a few friends have attempted to dissect this to find out what each byte means, we think the following:

0xB8 is the first mov instruction.
0x3C (60) is the first argument, moved into rax.
0x00 signifies rax? (this is where we get fuzzy)
The next two 0x00 are just excess nasm output? xD
0xBF is the next mov
0x00 is the first argument, 0
Clueless on the remaining 0x00
0x0F syscall?
0x05 clueless.

I'm a beginner (obviously) and would appreciate help with dissecting this machine code. Any help would be greatly appreciated as understanding this will help a lot in understand x86 instruction formatting, thanks in advance!

Edit: Is it a possibility that registers are specified by the instruction opcode?

La solution

I'm not an expert, but taking http://ref.x86asm.net/coder64.html#x05 as a reference, I put together the following explanation:

0xB8 - first mov
0x3C 0x00 0x00 0x00 - 32bit argument (60), little endian byte order
0xBF - second mov
0x00 0x00 0x00 0x00 - 32bit argument (0)
0x0F - 0x0F instruction prefix
0x05 - syscall in the "0x0F space"

And yeah, opcode byte values change based on registers and argument types. Some types of movs and jmps also need the 0x0F prefix.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow