Instruction Lengths

https://stackoverflow.com/questions/4567903

14-10-2019
|

Question

I was looking at the different instructions in assembly and I am confused on how the lengths of different operands and opcodes are decided upon.

Is it something you ought to know from experience, or is there a way to find out which operand/operator combination takes up how many bytes?

For eg:

push %ebp ; takes up one byte
mov %esp, %ebp ; takes up two bytes

So the question is:

Upon seeing a given instruction, how can I deduce how many bytes its opcode will require?

Solution

There's no hard and fast rule for x86 without a database as the instruction encoding is pretty complex (and the opcode itself can vary from 1 to 3 bytes). You can consult the Intel® 64 and IA-32 Architectures Software Developer’s Manual 2A document (Chapter 2: Instruction Format) to see how instructions and their operands are encoded:

OTHER TIPS

The length of the op-code is built with (at least) two criteria in mind

the frequency of the op-code (put it on 1 byte if frequently used in programs, and if possible)
the information necessary for the op-code to function (if you need an absolute address, the code cannot be encoded on a unique byte)

Also,

between the initial 8088 to the latest Intel processors (3 decades) a lot of new instructions have been created, and some, while frequently appearing in programs, could not be coded on one single byte, because the whole 256 values were reserved.

Besides the link provided in another answer (that lists specifically the size of a code), see also the processors history.

Terminology: "opcode" is the part of the instruction that selects the operation, not including operands, or non-mandatory prefixes that modify the operation (e.g. operand-size). Using "opcode" to refer to the whole instruction is incorrect, although done fairly often by some people talking about shellcode.

Is it something you ought to know from experience

With experience looking at machine code or especially experience optimizing for code-size, then yes, you'll start to remember things you've looked up repeatedly, and learn how to look at an asm line and know how long the instruction will be, without memorizing what the bytes will be.

The operand-encoding rules don't depend on the opcode, so you just have to remember the opcode lengths, and the special-case short forms that don't use a ModR/M byte to encode operands. And then separately remember the operand-encoding rules.

For me personally, I like to answer code-golf questions like this one with x86 machine code. (See also Tips for golfing in x86/x64 machine code). I write in NASM, planning / knowing how long each instruction will be, and let the assembler generate the hexdump of the actual machine code as a listing. For the short instructions that are useful for code-golf, I don't remember having been wrong about instruction length any time recently, but I'm lucky to have a good memory for details (like the x86 instruction set) that I find interesting or use a lot. (I did have to try rorx to see how long it was.)

I don't type out the machine-code bytes myself; to do that by hand I'd have to look each instruction up in the manual. x86 doesn't have short encodings for PC-relative addressing, so finding/creating useful constants inside the machine code (that could double as data) is not a thing, so it's not generally useful for code-golf to memorize any of the numeric details of instruction encoding.

When optimizing for performance, smaller is usually better when everything else is equal, so caring about code-size and especially alignment is definitely part of performance.

or is there a way to find out which operand/operator combination takes up how many bytes?

This is well documented in manuals. Other than a few special-case 1-byte instructions, operand encoding is the same for (almost) everything.

The machine-code encoding of most x86 instructions follow this pattern (better diagram version of this from Intel in @Mehrdad's answer):

[prefixes] opcode ModR/M [extra addressing-mode bytes] [immediate]

(Instructions with no explicit operands don't have a ModR/M byte, just the opcode byte(s)).

x86 opcodes are 1 byte for most common instructions, especially instructions which have existed since 8086. Instructions added later (e.g. like bsf and movsx in 386) often use 2-byte opcodes with a 0f escape byte. If you hang around on SO, you'll see a lot of questions asking about 8086 specifically (especially emu8086); that's the main reason I know something about which instructions weren't available on 8086. If you'd rather just remember directly which instructions have 2-byte opcodes without the historical details, that's totally fine. Or just look it up every time in the manual :P

e.g. 0f b6 c0 movzx eax,al, so 0F B6 is the opcode for mov r32, r/m8, and C0 is the ModR/M byte that encodes eax as the destination (/r field = 0), register direct mode for the source (top 2 bits = 11), and al as the source register (/m field = 0).

I'm using Intel syntax for all my examples (mnemonic dst, src1 [,src2, ...]) because that matches what you'll find in Intel and AMD's manuals. AFAIK, there aren't any detailed instruction-encoding manuals that use AT&T syntax. I'm also using 32 or 64-bit examples even when talking about what 8086 had. Of course 8086 only had 16-bit real mode, but the same opcode and encoding is used in 64-bit mode (which is what we care about these days).

Intel's instruction set ref. manual (SDM vol.2) has opcode maps for 1, 2, 3 byte opcodes (appendix A.3), so you can see some patterns in the choice of opcode encoding. Or for any given instruction, look at the encoding listed along with the full description in that manual. (Also see some nice online extracts with one page per instruction, like https://github.com/HJLebbink/asm-dude/wiki and http://felixcloutier.com/x86/. HJ Lebbink's page tags each instruction with when it was introduced, so you can see 8086 for add, or 386 for new forms of shifts, and for movzx).

Note that some one-operand instructions, like shl or not, use the /r field of the ModR/M byte as extra opcode bits. Also most instructions with an immediate are still destructive because they use the /r field as opcode bits. imul r32, r/m32, imm32 (386) is the exception to this rule, having an immediate and using the full ModR/M byte for both operands. (Note that the ModR/M can only signal register or memory operands; the encoding for add r/m32, imm8 uses the opcode to indicate that there's an immediate. But the main opcode byte is shared by multiple instructions, so the /r field is used as part of the opcode, and that's why we don't have add r/m32, r32, imm8. But for ADD / SUB we can use lea ecx, [rax + 1] as a copy-and-add.)

Operand encoding:

Most instructions with an immediate operand are the same length as the register/memory source version, plus the bytes to encode the immediate. Immediates are either imm8 or imm32, so values from -128..127 are more compact. (In 16-bit mode it's either imm8 or imm16).

The ModR/M byte is all that's needed for register direct, or the simplest one-register addressing mode with no displacement. (Except with [esp]). So add eax, ecx is 2 bytes long, just like add eax, [ecx]. Indexed addressing modes (and modes with esp / rsp as the base register) need a SIB (Scale/Index/Base) byte.

Constant displacements in addressing modes need an extra 1 or 4 bytes (sign-extended disp8 or disp32) on top of the ModR/M + optional SIB.

AVX512 EVEX with disp8 scales the disp8 by the vector width, so vaddps zmm31, zmm30, [rsi + 256] is only 7 bytes (4-byte EVX + opcode=0x58 + modrm + disp8), but vaddps zmm31, zmm30, [rsi + 16] is 11 bytes: it has to use a disp32 to encode +16, because it's not a multiple of 64. But the same instruction with xmm registers could use a disp8.

See Intel's manuals for the full details.

Special short-forms of the most common instructions

To save code-size, 8086 (and later x86) provides special encodings with no ModR/M byte for some very common instructions. If the instruction isn't one of these, it uses a ModR/M byte

add/adc/sub/cmp/test/and/or/xor/etc. AL/AX/EAX with an immediate of the same size as the register. e.g. and eax, imm32 (5 bytes) or and al,imm8 (2 bytes). But there's no special encoding for and eax, imm8; that still has to use the 3-byte and r/m32, imm8 encoding. Using al can be very good for code-size when working with 8-bit data, especially if you've avoided or aren't worried about partial-register stalls or false dependencies causing performance problems.
shift/rotate with a count of 1: 8086 didn't have imm8 rotates, only by cl or by an implicit 1, so there are opcodes like shl r/m32,1 where the 1 is implicit.

Using the imm8 encoding has performance implications: potential stalls on P6-family because it doesn't check if the imm8 is zero until execution. But the rol r32,1 short form is 2 uops, vs. 1 for rol r32, imm8 (even if the imm8 is 1) on Sandybridge-family including Skylake. The rcl r32,1 short form is much faster than with an imm8. (3 uops vs. 8 on Skylake).

And several where the register is encoded in the low 3 bits of the instruction byte, effectively dedicating 8 bytes of opcode coding space to making the register-operand form of these instructions 1 byte shorter.

mov r8, imm8: 2 bytes instead of 3 for the general mov r/m8, imm8 encoding.
mov r32, imm32: 5 bytes instead of 6 bytes for mov r/m32, imm32. Fun fact: in x86-64, the REX.W=1 version of the short-form opcode is the only instruction that can use a 64-bit immediate. 10-byte mov r64, imm64. The REX.W=1 version of the r/m32 opcode still uses a 32-bit immediate (sign-extended like usual), so mov rax, -1 is best encoded that way, taking 7 bytes vs. 5-byte mov eax,-1. (Or if optimizing for code-size, see also Set all bits in CPU register to 1 efficiently.)
push/pop register, 1 byte vs. 2 bytes for the pop r/m32 encoding.
push/pop segment registers (other than FS/GS). Although there isn't a r/m16 encoding for these.
inc r32 / dec r32 (16/32-bit mode only: the 0x4X bytes are REX prefixes in x86-64, so inc eax has to use the 2-byte inc r/m32 encoding).
xchg eax, reg: This is where 0x90 nop comes from: the short-form of xchg eax,eax (or in 16-bit mode, xchg ax,ax). In x86-64, 90 nop is not also xchg eax,eax, because that would zero-extend EAX into RAX. Instead, it has its own instruction-set manual entry.

xchg reg,reg is never used by compilers, and is usually not faster than 3 mov instructions, so it would be nice if we had those 7 opcode bytes back for more useful future extensions. (Or 8 if nop was moved to a different opcode...). It was more useful in 8086 when the accumulator was "more special", e.g. cbw to sign-extend AL into AX was the only (good) way because movsx didn't exist. And only 1-operand mul / imul was available.

xchg eax, r32 is still great for code-golf, e.g. GCD in 8 bytes of x86 32-bit machine code. See also my other code-golf answers for various code-size tricks (mostly at the expense of performance; that's the point of code-golf).

I think this covers all the single-byte special cases of instructions that also have r/m32 encodings.

This answer isn't meant to be exhaustive. I haven't talked about more recent instructions much, and there are lots of special cases for rare instructions. The rules for when a REX prefix or operand-size prefix are required are pretty straightforward. Here's are some more general rules:

SSE1/SSE3 ABCps instructions have 2-byte opcodes (0F xx)
SSE2 integer / double-precision instructions generally have 3-byte opcodes (66 0F xx or similar)
SSSE3/SSE4.x instructions have 4-byte opcodes (3 mandatory prefixes)

VEX-coded instructions can use a 2-byte VEX prefix if the SSE version is SSE3 or earlier, and the 2nd source register isn't a "high" register (xmm/ymm8-15). The XMM and YMM versions of the same instruction are always the same size. (But prefer xmm with implicit zero-extension instead of explicit ymm when you don't care or want the high half zeroed.)

vpxor  ymm8,ymm8,ymm5    ; 2-byte VEX
vpxor  ymm7,ymm7,ymm8    ; 3-byte VEX
vpxor  ymm7,ymm8,ymm7    ; 2-byte VEX

So we can use "high" registers as the destination or first source without needing a 3-byte VEX, but not as the 2nd source (3rd operand overall). For commutative operations, you can save size by putting the low8 one as the 2nd source.

Note that for 4-operand instructions like vblendvps, the 4th operand is encoded in an imm8. So it's still the 3rd operand (2nd source), not the last operand, which affects what size VEX prefix is needed. But blendvps is SSE4.1, so it always needs a 3-byte VEX prefix anyway to represent the 66.0F3A encoding of the prefix field.

Usually, this isn't something you need to know from one instruction to the next when programming in assembly language. If it ever matters (such as if you're trying to fit some particular code into a constrained space), you can look at the listing output from the assembler, or a disassembly listing.

From my 6510 assembly days, the answer usually pertained to operand addresses and offsets. Opcodes were always 1 byte for the 6510. Addresses were always two bytes. If the Opcode required one address, then I knew the total size was three bytes. If two addresses were specified, then I knew the total size was 5 bytes.

As for offsets, the space they occupied was contingent on the length of the branch. So consider this:

bne FooBar

If the "Foobar" offset was pointing to an address that was less than 128 bytes away, then the operand was a single byte. If the offset pointed to an address beyond that, then a full address was needed. A full address was no longer an offset, and of course addresses occupied two bytes.

So in this latter case, it might not be easy to tell if the opcode + operand required two or three bytes.

So I guess, sometimes you can tell and other times it isn't so obvious.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow