Question

Why does NASM use 0x89 opcode (137) when it assembles a MOV instruction between two registers?

Here is an example of code assembled using NASM:

55      push ebp
89E5    mov ebp, esp
83EC04  sub esp, byte +0x4
31C0    xor eax, eax
C9      leave
C3      ret

I wanted something like this:

55      push ebp
8BEC    mov ebp, esp
83EC04  sub esp, byte +0x4
33C0    xor eax, eax
C9      leave
C3      ret

The reason I wanted 0x8B was: if you view the binary representation of the MOV instruction, it looks like this in NASM:

Opcode     Mod   Reg   R/M
10001001   11    100   101 (89 E5)

The confusing part in this is that the reg operand is the second.

The NASM syntax is this: 0x89 11 source_reg destination_reg
and the MOV instruction is mov destination_reg, source_reg

Was it helpful?

Solution

The two opcodes are the same. That's x86's redundancy. The assembler can choose whatever it likes

A typical instruction of x86 architecture has two opcodes. The first of them has a register as the first operand and a register or a memory location as the second one (that's abbreviated "reg, reg/mem32" in the opcode reference or "Gv, Ev" in the opcode table). The operands for the second opcode are reversed (that's abbreviated "reg/mem32, reg" or "Ev, Gv"). This makes sense: the processor must know if it copies to the memory, or from the memory. But when both operands are registers, the encoding becomes redundant:

                  ; mod reg r/m
03C3 add eax, ebx ;  11 000 011
01D8 add eax, ebx ;  11 011 000

There are much more than just reg/reg style like this. See it here

Different assemblers emit different opcodes, so this technique can be used to identify the assembler

Some assemblers allow you to choose the encoding. For example GAS can emit the other encoding if you affix .s to the end

10 de   adcb   %bl,%dh
12 f3   adcb.s %bl,%dh

What is the ".s" suffix in x86 instructions?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top